apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
813 stars 272 forks source link

segmentation fault starting router #4678

Open NeoPhi opened 9 months ago

NeoPhi commented 9 months ago

Describe the bug segmentation fault while trying to run local composition example

To Reproduce

% ./router --supergraph=supergraph-example.graphql
2024-02-17T10:49:27.032241Z INFO  Apollo Router v1.40.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
2024-02-17T10:49:27.032277Z INFO  Anonymous usage data collection is disabled.
zsh: segmentation fault  ./router --supergraph=supergraph-example.graphql

Expected behavior The router starts up

Desktop (please complete the following information):

% ./router --version
1.40.0
% uname -a
Darwin MetaCortex.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:28:58 PST 2023; root:xnu-10002.81.5~7/RELEASE_X86_64 x86_64

Additional context Supergraph composed via:

% cat supergraph-example.yaml
federation_version: 2
subgraphs:
  locations:
    routing_url: https://flyby-locations-sub.herokuapp.com/
    schema:
      subgraph_url: https://flyby-locations-sub.herokuapp.com/
  reviews:
    routing_url: https://flyby-reviews-sub.herokuapp.com/
    schema:
      subgraph_url: https://flyby-reviews-sub.herokuapp.com/
% rover supergraph compose --config ./supergraph-example.yaml --output supergraph-example.graphql
% rover --version
Rover 0.22.0

Trace level startup data:

% ./router --log=TRACE --supergraph=supergraph-example.graphql
2024-02-17T10:58:36.369569Z DEBUG  creating plugin factory plugin_factory_name=apollo.include_subgraph_errors
2024-02-17T10:58:36.369617Z DEBUG  creating plugin factory plugin_factory_name=apollo.traffic_shaping
2024-02-17T10:58:36.369624Z DEBUG  creating plugin factory plugin_factory_name=apollo.csrf
2024-02-17T10:58:36.369635Z DEBUG  creating plugin factory plugin_factory_name=apollo.subscription
2024-02-17T10:58:36.369641Z DEBUG  creating plugin factory plugin_factory_name=experimental.record
2024-02-17T10:58:36.369647Z DEBUG  creating plugin factory plugin_factory_name=apollo.progressive_override
2024-02-17T10:58:36.369654Z DEBUG  creating plugin factory plugin_factory_name=apollo.override_subgraph_url
2024-02-17T10:58:36.369658Z DEBUG  creating plugin factory plugin_factory_name=apollo.preview_entity_cache
2024-02-17T10:58:36.369779Z DEBUG  creating plugin factory plugin_factory_name=apollo.forbid_mutations
2024-02-17T10:58:36.369790Z DEBUG  creating plugin factory plugin_factory_name=apollo.authorization
2024-02-17T10:58:36.369795Z DEBUG  creating plugin factory plugin_factory_name=apollo.authentication
2024-02-17T10:58:36.369800Z DEBUG  creating plugin factory plugin_factory_name=experimental.restricted
2024-02-17T10:58:36.369809Z DEBUG  creating plugin factory plugin_factory_name=apollo.rhai
2024-02-17T10:58:36.369819Z DEBUG  creating plugin factory plugin_factory_name=apollo.telemetry
2024-02-17T10:58:36.369826Z DEBUG  creating plugin factory plugin_factory_name=apollo.headers
2024-02-17T10:58:36.369831Z DEBUG  creating plugin factory plugin_factory_name=apollo.coprocessor
2024-02-17T10:58:36.369839Z DEBUG  creating plugin factory plugin_factory_name=experimental.broken
2024-02-17T10:58:36.369941Z DEBUG  creating plugin factory plugin_factory_name=experimental.expose_query_plan
2024-02-17T10:58:36.419236Z INFO  Apollo Router v1.40.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
2024-02-17T10:58:36.419257Z INFO  Anonymous usage data collection is disabled.
2024-02-17T10:58:36.419595Z DEBUG  starting
2024-02-17T10:58:36.420315Z TRACE  recursion limit data recursion_limit=2
2024-02-17T10:58:36.420517Z DEBUG  A valid Apollo license was not detected. However, no restricted features are in use.
2024-02-17T10:58:36.421215Z TRACE  recursion limit data recursion_limit=2
zsh: segmentation fault  ./router --log=TRACE --supergraph=supergraph-example.graphql
garypen commented 9 months ago

Hi @NeoPhi . Thanks for the well documented issue. I note that you must have set APOLLO_TELEMETRY_DISABLED=true in your environment, so I've added that in to my reproduction efforts. I'm trying to reproduce on a 2021 Mac M1, and I can't.

Could you try repeating the command: ./router --supergraph=supergraph-example.graphql with RUST_BACKTRACE=1 in your environment and trying to obtain a stack trace? That may give us a useful clue as to what is happening.

Be aware that you should carefully examine the contents of a backtrace before publicly posting it. It may contain sensitive personal data that you don't wish to share with the world. If you sanitize it so that we only see the functions which are called, that will be enough for us to work with.

NeoPhi commented 9 months ago

Unfortunately adding that didn't seem to generate any additional debugging output. I think I key difference from a reproduction might be that I'm running on a 2.3 GHz Quad-Core Intel Core i7 Mac from 2020:

2024-02-19T17:02:42.383598Z WARN  RUST_BACKTRACE=1 detected. This is useful for diagnostics but will have a performance impact and may leak sensitive information
2024-02-19T17:02:42.384611Z DEBUG  creating plugin factory plugin_factory_name=apollo.include_subgraph_errors
2024-02-19T17:02:42.384657Z DEBUG  creating plugin factory plugin_factory_name=apollo.traffic_shaping
2024-02-19T17:02:42.384664Z DEBUG  creating plugin factory plugin_factory_name=apollo.csrf
2024-02-19T17:02:42.384675Z DEBUG  creating plugin factory plugin_factory_name=apollo.subscription
2024-02-19T17:02:42.384680Z DEBUG  creating plugin factory plugin_factory_name=experimental.record
2024-02-19T17:02:42.384686Z DEBUG  creating plugin factory plugin_factory_name=apollo.progressive_override
2024-02-19T17:02:42.384690Z DEBUG  creating plugin factory plugin_factory_name=apollo.override_subgraph_url
2024-02-19T17:02:42.384694Z DEBUG  creating plugin factory plugin_factory_name=apollo.preview_entity_cache
2024-02-19T17:02:42.384793Z DEBUG  creating plugin factory plugin_factory_name=apollo.forbid_mutations
2024-02-19T17:02:42.384805Z DEBUG  creating plugin factory plugin_factory_name=apollo.authorization
2024-02-19T17:02:42.384810Z DEBUG  creating plugin factory plugin_factory_name=apollo.authentication
2024-02-19T17:02:42.384815Z DEBUG  creating plugin factory plugin_factory_name=experimental.restricted
2024-02-19T17:02:42.384823Z DEBUG  creating plugin factory plugin_factory_name=apollo.rhai
2024-02-19T17:02:42.384832Z DEBUG  creating plugin factory plugin_factory_name=apollo.telemetry
2024-02-19T17:02:42.384840Z DEBUG  creating plugin factory plugin_factory_name=apollo.headers
2024-02-19T17:02:42.384845Z DEBUG  creating plugin factory plugin_factory_name=apollo.coprocessor
2024-02-19T17:02:42.384853Z DEBUG  creating plugin factory plugin_factory_name=experimental.broken
2024-02-19T17:02:42.384860Z DEBUG  creating plugin factory plugin_factory_name=experimental.expose_query_plan
2024-02-19T17:02:42.436303Z INFO  Apollo Router v1.40.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
2024-02-19T17:02:42.436321Z INFO  Anonymous usage data collection is disabled.
2024-02-19T17:02:42.436697Z DEBUG  starting
2024-02-19T17:02:42.437380Z TRACE  recursion limit data recursion_limit=2
2024-02-19T17:02:42.437580Z DEBUG  A valid Apollo license was not detected. However, no restricted features are in use.
2024-02-19T17:02:42.438159Z TRACE  recursion limit data recursion_limit=2
zsh: segmentation fault  env APOLLO_TELEMETRY_DISABLED=1 RUST_BACKTRACE=1 ./router --log=TRACE
marianoqueirel commented 9 months ago

Hi team! I'm facing the same issue, I've also used the same subgraphs examples as the reporter. I set RUST_BACKTRACE=1 but no trace was shown, see below:

➜ ./router --log=TRACE --supergraph=supergraph.graphql --dev

2024-02-20T16:45:44.294718Z WARN  RUST_BACKTRACE=1 detected. This is useful for diagnostics but will have a performance impact and may leak sensitive information
2024-02-20T16:45:44.294876Z INFO  Running with *development* mode settings which facilitate development experience (e.g., introspection enabled)
2024-02-20T16:45:44.295682Z DEBUG  creating plugin factory plugin_factory_name=apollo.progressive_override
2024-02-20T16:45:44.295709Z DEBUG  creating plugin factory plugin_factory_name=apollo.subscription
2024-02-20T16:45:44.295718Z DEBUG  creating plugin factory plugin_factory_name=apollo.preview_entity_cache
2024-02-20T16:45:44.295727Z DEBUG  creating plugin factory plugin_factory_name=apollo.rhai
2024-02-20T16:45:44.295732Z DEBUG  creating plugin factory plugin_factory_name=experimental.record
2024-02-20T16:45:44.295736Z DEBUG  creating plugin factory plugin_factory_name=apollo.override_subgraph_url
2024-02-20T16:45:44.295748Z DEBUG  creating plugin factory plugin_factory_name=apollo.traffic_shaping
2024-02-20T16:45:44.295755Z DEBUG  creating plugin factory plugin_factory_name=experimental.expose_query_plan
2024-02-20T16:45:44.295761Z DEBUG  creating plugin factory plugin_factory_name=experimental.broken
2024-02-20T16:45:44.295768Z DEBUG  creating plugin factory plugin_factory_name=apollo.forbid_mutations
2024-02-20T16:45:44.295774Z DEBUG  creating plugin factory plugin_factory_name=apollo.csrf
2024-02-20T16:45:44.295782Z DEBUG  creating plugin factory plugin_factory_name=apollo.authorization
2024-02-20T16:45:44.295788Z DEBUG  creating plugin factory plugin_factory_name=apollo.authentication
2024-02-20T16:45:44.295793Z DEBUG  creating plugin factory plugin_factory_name=experimental.restricted
2024-02-20T16:45:44.295801Z DEBUG  creating plugin factory plugin_factory_name=apollo.telemetry
2024-02-20T16:45:44.295806Z DEBUG  creating plugin factory plugin_factory_name=apollo.include_subgraph_errors
2024-02-20T16:45:44.295813Z DEBUG  creating plugin factory plugin_factory_name=apollo.headers
2024-02-20T16:45:44.295821Z DEBUG  creating plugin factory plugin_factory_name=apollo.coprocessor
2024-02-20T16:45:44.354506Z INFO  Apollo Router v1.40.1 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
2024-02-20T16:45:44.354523Z INFO  Anonymous usage data collection is disabled.
2024-02-20T16:45:44.356256Z DEBUG  starting
2024-02-20T16:45:44.357013Z TRACE  recursion limit data recursion_limit=1
2024-02-20T16:45:44.357181Z DEBUG  A valid Apollo license was not detected. However, no restricted features are in use.
2024-02-20T16:45:44.357361Z WARN  telemetry.instrumentation.spans.mode is currently set to 'deprecated', either explicitly or via defaulting. Set telemetry.instrumentation.spans.mode explicitly in your router.yaml to 'spec_compliant' for log and span attributes that follow OpenTelemetry semantic conventions. This option will be defaulted to 'spec_compliant' in a future release and eventually removed altogether
2024-02-20T16:45:44.358215Z TRACE  recursion limit data recursion_limit=1
[1]    32624 segmentation fault  ./router --log=TRACE --supergraph=supergraph.graphql --dev

Rover v0.22.0

Macbook Pro details:

16-inch, 2019 MAC 
2,3 GHz 8-Core Intel Core i9
AMD Radeon Pro 5500M 4 GB
Intel UHD Graphics 630 1536 MB
garypen commented 9 months ago

@NeoPhi and @marianoqueirel If you type:

file <path to router>

What do you see?

marianoqueirel commented 9 months ago

Thanks @garypen, running file <path to router> I see:

➜  file ~/code/graphql-router/router
/Users/mqueirel/code/graphql-router/router: Mach-O 64-bit executable x86_64

I've updated the preview message
with the output I receive when I run the router with the flag --dev i.e. ./router --log=TRACE --supergraph=supergraph.graphql --dev to point out that it also includes this warning

2024-02-20T16:45:44.357361Z WARN telemetry.instrumentation.spans.mode is currently set to 'deprecated', either explicitly or via defaulting. Set telemetry.instrumentation.spans.mode explicitly in your router.yaml to 'spec_compliant' for log and span attributes that follow OpenTelemetry semantic conventions. This option will be defaulted to 'spec_compliant' in a future release and eventually removed altogether

I tried router v1.40.0, v1.40.1 and v1.39.1 without success, the three of them throw the same error

garypen commented 9 months ago

@marianoqueirel Thanks for the extra information, that has helped narrow down what may be the problem.

One more request: could you try 1.37.0 and let us know the result? I expect that should work fine.

If it does work fine, then the probable cause of the issues you are experiencing is the cross compilation from aarch64 to x86_64 that we now perform in our CI environment.

NeoPhi commented 9 months ago

I was able to successfully run the router using version 1.37.0.

garypen commented 9 months ago

Thanks for the extra information. We'll dig into this a bit further and update when we know more.

NeoPhi commented 9 months ago

Doing a local release build from the 1.40.1 source I'm able to start the router. I did have to install cmake for the build to complete the build. Makes me think that there are parts of the build that might not be properly being built cross platform.

garypen commented 9 months ago

We have the same thoughts. The solution is not, unfortunately, straight forward and we'll need to consider a number of factors before we decide how to address this. Factors include:

garypen commented 9 months ago

We've figured out the cause. We use the router-bridge crate to perform query planning within the router. This crate embed a TS interpreter which requires deno-core.

Without going into too many details, the TS is loaded into V8 (which is the TS/JS runtime component of deno) as a snapshot and, unfortunately, that snapshot has architecture dependencies.

So, the router cross-compiles fine, but as soon as we start to execute a query plan on a different architecture to that on which the snapshot was created, we'll blow up like we do.

We are considering our options and will proceed with a fix soon. The choices are:

marianoqueirel commented 9 months ago

Thank you ! appreciated.