apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
807 stars 272 forks source link

Very slow bootstrap time for large supergraph (9MB supergraph -> 6:22 mins bootstrap time) #3531

Closed rezoled closed 5 months ago

rezoled commented 1 year ago

Describe the bug We have an almost 10MB supergraph schema in our application, running Router with minimal configuration takes 6 mins until it starts accepting requests.

To Reproduce Kind of difficult to reproduce without supplying my org's schema, and I can't do that.

Expected behavior The JS based federated gateway works much faster, I expected the rust based router to be faster than the JS implementation.

Output

Desktop (please complete the following information):

Additional context log:

{"timestamp":"2023-08-03T15:03:24.548836Z","level":"INFO","message":"Apollo Router v1.26.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)","target":"apollo_router::executable"}
{"timestamp":"2023-08-03T15:03:24.548905Z","level":"INFO","message":"Anonymous usage data is gathered to inform Apollo product development.  See https://go.apollo.dev/o/privacy for details.","target":"apollo_router::executable"}
{"timestamp":"2023-08-03T15:03:24.564031Z","level":"INFO","message":"Running with *development* mode settings which facilitate development experience (e.g., introspection enabled)","target":"apollo_router::configuration::expansion"}
{"timestamp":"2023-08-03T15:03:24.593641Z","level":"INFO","message":"Running with *development* mode settings which facilitate development experience (e.g., introspection enabled)","target":"apollo_router::configuration::expansion"}
{"timestamp":"2023-08-03T15:09:42.721946Z","level":"INFO","message":"Health check endpoint exposed at http://127.0.0.1:8088/health"}
{"timestamp":"2023-08-03T15:09:42.722456Z","level":"INFO","message":"GraphQL endpoint exposed at http://127.0.0.1:3333/graphql 🚀"}
{"timestamp":"2023-08-03T15:09:42.722696Z","level":"INFO","message":"You're using some \"experimental\" features of the Apollo Router (those which have their configuration prefixed by \"experimental_\").\nWe may make breaking changes in future releases. To help us design the stable version we need your feedback.\nHere is a list of links where you can give your opinion:\n\n\t- experimental_response_trace_id: https://github.com/apollographql/router/discussions/2147\n\nFor more information about launch stages, please see the documentation here: https://www.apollographql.com/docs/resources/product-launch-stages/"}
BrynCooke commented 1 year ago

A 10Mb supergraph schema doesn't sound like a lot. Suspect something else is going on. Are you using managed federation?

rezoled commented 1 year ago

Not managed, running the router binary on my machine

BrynCooke commented 1 year ago

Does increasing the log level give any clues? Otherwise I could see us adding some debug statements to try and narrow this down.

Are you running a stock binary or something that is customized?

rezoled commented 1 year ago

Stock, I tried running with trace logs and didn’t see anything unusual. I’ll run it again and post the logs.

rezoled commented 1 year ago
{"timestamp":"2023-08-04T10:11:23.613674Z","level":"INFO","message":"Apollo Router v1.26.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)","target":"apollo_router::executable"}
{"timestamp":"2023-08-04T10:11:23.615220Z","level":"INFO","message":"Anonymous usage data is gathered to inform Apollo product development.  See https://go.apollo.dev/o/privacy for details.","target":"apollo_router::executable"}
{"timestamp":"2023-08-04T10:11:23.645797Z","level":"INFO","message":"Running with *development* mode settings which facilitate development experience (e.g., introspection enabled)","target":"apollo_router::configuration::expansion"}
{"timestamp":"2023-08-04T10:11:23.659379Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.include_subgraph_errors","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659448Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.rhai","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659465Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"experimental.restricted","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659469Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"experimental.broken","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659644Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.headers","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659657Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.coprocessor","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.659837Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.subscription","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660013Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.forbid_mutations","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660036Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.csrf","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660043Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.authentication","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660074Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.telemetry","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660079Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.override_subgraph_url","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660083Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"experimental.expose_query_plan","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660102Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.traffic_shaping","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.660107Z","level":"DEBUG","message":"creating plugin factory","plugin_factory_name":"apollo.authorization","target":"apollo_router::plugin"}
{"timestamp":"2023-08-04T10:11:23.712806Z","level":"DEBUG","message":"starting","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:23.714150Z","level":"DEBUG","message":"state machine event: UpdateLicense(Unlicensed), transitioned from: Startup to: Startup","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:23.714819Z","level":"DEBUG","message":"state machine event: NoMoreLicense, transitioned from: Startup to: Startup","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:23.714988Z","level":"INFO","message":"Running with *development* mode settings which facilitate development experience (e.g., introspection enabled)","target":"apollo_router::configuration::expansion"}
{"timestamp":"2023-08-04T10:11:23.726974Z","level":"DEBUG","message":"state machine event: UpdateConfiguration(<redacted>), transitioned from: Startup to: Startup","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:24.116748Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:11:24.540073Z","level":"DEBUG","message":"A valid Apollo license was not detected. However, no restricted features are in use.","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:24.917883Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:17:05.099384Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:17:05.441648Z","level":"DEBUG","message":"configuring Apollo metrics","target":"apollo_router::plugins::telemetry::metrics::apollo"}
{"timestamp":"2023-08-04T10:17:05.443025Z","level":"DEBUG","message":"configuring Apollo tracing","target":"apollo_router::plugins::telemetry::tracing::apollo"}
{"timestamp":"2023-08-04T10:17:05.444802Z","level":"DEBUG","message":"plugins list: [\"apollo.include_subgraph_errors\", \"apollo.csrf\", \"apollo.headers\", \"apollo.telemetry\", \"apollo.traffic_shaping\", \"experimental.expose_query_plan\"]","target":"apollo_router::router_factory"}
{"timestamp":"2023-08-04T10:17:09.724676Z","level":"INFO","message":"Health check endpoint exposed at http://127.0.0.1:8088/health"}
{"timestamp":"2023-08-04T10:17:09.729015Z","level":"INFO","message":"GraphQL endpoint exposed at http://127.0.0.1:3333/graphql 🚀"}
{"timestamp":"2023-08-04T10:17:09.729477Z","level":"DEBUG","message":"extra endpoints the router listens to","tracing_endpoints":"http://127.0.0.1:8088"}
{"timestamp":"2023-08-04T10:17:09.730556Z","level":"INFO","message":"You're using some \"experimental\" features of the Apollo Router (those which have their configuration prefixed by \"experimental_\").\nWe may make breaking changes in future releases. To help us design the stable version we need your feedback.\nHere is a list of links where you can give your opinion:\n\n\t- experimental_response_trace_id: https://github.com/apollographql/router/discussions/2147\n\nFor more information about launch stages, please see the documentation here: https://www.apollographql.com/docs/resources/product-launch-stages/"}
{"timestamp":"2023-08-04T10:17:09.891406Z","level":"DEBUG","message":"state machine event: UpdateSchema(<redacted>), transitioned from: Startup to: Running"}
{"timestamp":"2023-08-04T10:17:10.134615Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0"}
{"timestamp":"2023-08-04T10:17:10.558514Z","level":"DEBUG","message":"sending anonymous usage data to Apollo"}
{"timestamp":"2023-08-04T10:17:10.629829Z","level":"DEBUG","message":"transmitting anonymous analytics: {\n  \"session_id\": \"9d99f55a-ab4f-46cf-8db6-759c933ef043\",\n  \"version\": \"1.26.0\",\n  \"platform\": {\n    \"os\": \"macos\",\n    \"continuous_integration\": null\n  },\n  \"usage\": {\n    \"args.log_level.<redacted>\": 1,\n    \"configuration.headers.all.request.propagate.len\": 1,\n    \"args.supergraph_path.<redacted>\": 1,\n    \"configuration.telemetry.tracing.experimental_response_trace_id.enabled.true\": 1,\n    \"configuration.plugins.len\": 0,\n    \"configuration.supergraph.listen.len\": 2,\n    \"configuration.supergraph.listen.<redacted>\": 3,\n    \"args.dev.true\": 1,\n    \"configuration.headers.all.request.propagate.matching.<redacted>\": 1,\n    \"configuration.homepage.enabled.false\": 1,\n    \"configuration.supergraph.introspection.true\": 1,\n    \"configuration.supergraph.path.<redacted>\": 1,\n    \"configuration.include_subgraph_errors.all.true\": 1,\n    \"args.config_path.<redacted>\": 1,\n    \"configuration.sandbox.enabled.true\": 1,\n    \"configuration.cors.allow_any_origin.true\": 1\n  }\n}"}
rezoled commented 1 year ago

6 mins again

Geal commented 1 year ago

@rezoled would it be ok to share that schema with us so we can look into it?

rezoled commented 1 year ago

This is our company's IP, we can only share it under NDA agreement... Any chance of getting a custom binary with added logs? You can see that between these 2 log messages:

{"timestamp":"2023-08-04T10:11:24.917883Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:17:05.099384Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}

There are 6 minutes of log silence, when obviously something is begin done in the background

Geal commented 1 year ago

This is our company's IP, we can only share it under NDA agreement... Right, we can avoid this for now Any chance of getting a custom binary with added logs? maybe not right now, first we'd have to narrow it down

So looking at the logs:

{"timestamp":"2023-08-04T10:11:23.726974Z","level":"DEBUG","message":"state machine event: UpdateConfiguration(<redacted>), transitioned from: Startup to: Startup","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:24.116748Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:11:24.540073Z","level":"DEBUG","message":"A valid Apollo license was not detected. However, no restricted features are in use.","target":"apollo_router::state_machine"}
{"timestamp":"2023-08-04T10:11:24.917883Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:17:05.099384Z","level":"TRACE","message":"recursion limit data","recursion_limit":"recursion limit: 4096, high: 0","target":"apollo_router::spec::schema"}
{"timestamp":"2023-08-04T10:17:05.441648Z","level":"DEBUG","message":"configuring Apollo metrics","target":"apollo_router::plugins::telemetry::metrics::apollo"}

The first mention of "recursion limit data" comes from here: https://github.com/apollographql/router/blob/37daffdac4ebcba46460d779aa04b097db58bdf9/apollo-router/src/state_machine.rs#L316-L318 There's no 6mn wait right after that. And this is all happening before loading the plugins, so most likely the wait happens in here:

https://github.com/apollographql/router/blob/37daffdac4ebcba46460d779aa04b097db58bdf9/apollo-router/src/query_planner/bridge_query_planner.rs#L56-L104

So either when creating the query planner or when generating the API schema. Could you give us some info about the schema's shape? How many subgraphs, types, fields? Do you use Contracts?

Geal commented 1 year ago

oh, also, which router and gateway versions are you testing this with?

rezoled commented 1 year ago

Sure. Subgraphs - 14 Not using contracts. Interfaces - 128 Types - 7,647 Inputs - 7,086 Enums - 702

I'm not entirely sure how to count fields, but there are more than 10K.

I downloaded Router from this command: curl -sSL https://router.apollo.dev/download/nix/latest | sh, I'm assuming 1.26.

rezoled commented 1 year ago

any idea?

Geal commented 1 year ago

that's a lot of input types, that might be the cause of inefficiencies there

rezoled commented 1 year ago

We have 100s of entities which can be queried, paged, and filtered

Geal commented 1 year ago

could you try to add this to your configuration:

experimental_graphql_validation_mode: new
Geal commented 9 months ago

@rezoled friendly ping on this! Did the new validation mode help for your issue?

Geal commented 5 months ago

@rezoled I will close this now, feel free to reopen if you see it happen again