Open marc-barry opened 1 year ago
The reloading
message seems to come from https://github.com/apollographql/router/blob/2ba02921504c7a2ecf20348ab14cf6e771250a82/apollo-router/src/state_machine.rs#L159 and the reload complete
comes from https://github.com/apollographql/router/blob/2ba02921504c7a2ecf20348ab14cf6e771250a82/apollo-router/src/state_machine.rs#L175.
I don't if it's part of the problem, but your step 3 should be:
APOLLO_ROUTER_HOT_RELOAD=false (not true)
I don't if it's part of the problem, but your step 3 should be:
APOLLO_ROUTER_HOT_RELOAD=false (not true)
Thank you. I made a mistake when typing out the bug report. I confirmed in the pod and the deployment that we have APOLLO_ROUTER_HOT_RELOAD=false
.
Is it relatively easy for me to produce a build which just comments out the hot reload code (or bypasses it altogether)? I'm not a Rust developer and so I'd like some input on what I could do to produce a build that can't possibly have hot reload enabled.
Maybe a solution could be to download your schema with rover and then run the router with the supergraph schema as a file with -s to pass the supergraph schema file or via the env variable then it won't reload when your schema change on studio
Maybe a solution could be to download your schema with rover and then run the router with the supergraph schema as a file with -s to pass the supergraph schema file or via the env variable then it won't reload when your schema change on studio
That's not a bad idea. That would also allow us to lessen/eliminate our need for managed federation. We already use rover
to produce the supergraph for development and so adapting our production deployment to use the same supergraph might indeed be a possibility.
Just to verify that things are working properly.
If you have access to a running pod, you can do something like this:
kubectl -n <namespace of your router> exec -it <name of your router pod> -- bash
apt update
apt install procps
cat /proc/1/cmdline
You will see something which looks either like:
/dist/router--hot-reload--config/app/configuration.yaml
(if hot-reload is being explicitly enabled)
OR
/dist/router--config/app/configuration.yaml
(hot-reload is not being explicitly enabled)
(you can clean up your pod packaging later if you like with apt remove procps
.)
If you could update the issue with your output that would help narrow down where any potential problem lies.
Thanks for opening this really well explained issue! Based on the description, I'd say this is currently working as designed. I do understand what you're trying to accomplish though — it makes sense — and in that regard, this suggestion from @bnjjj above is (currently) spot on and the currently-recommended approach for the behavior you're looking for:
a solution could be to download your schema with rover and then run the router with the supergraph schema as a file with -s to pass the supergraph schema file or via the env variable then it won't reload when your schema change on studio
Today, the Managed Federation feature (roughly described here) is designed — at its core — to deliver schemas to Routers while avoiding the need to manage orchestrating rolling them over — allowing them to stay online, and in theory benefiting from still-warm caches of reusable aspects of that graph, but mostly to avoid the extra configuration that some users try to avoid. Specifically, it ships those updated supergraphs whena GraphOS Launch has completed without errors. Of course, that is somewhat incompatible for your desired mode of operation here — which again I understand the value of — though it does work for a majority of our customer's workflows.
Put another way, "Hot reload" as a configuration option isn't meant to change the behavior of managed federation, but instead apply to the supergraph file and the configuration file for the Router. (In a few ways, it's great for local development.)
I know some folks here at Apollo have talked about increasing the atomicity/reproducible build aspect of this though — which might build on top of the existing launches feature — and I could expect more on that in the future. If you're interested, I could get you in touch with the right people!
If you could update the issue with your output that would help narrow down where any potential problem lies.
@garypen
root@apollo-router-7fd44dcc9b-z7rfj:/dist# cat /proc/1/cmdline
/dist/router--config/app/configuration.yaml
This is what all of our pods have for cmdline
.
Thanks for opening this really well explained issue! Based on the description, I'd say this is currently working as designed. I do understand what you're trying to accomplish though — it makes sense — and in that regard, this suggestion from @bnjjj above is (currently) spot on and the currently-recommended approach for the behavior you're looking for:
@abernix Thank you for the clear explanation. I guess I don't really understand what happens when hot reload is enabled. I guess we are using managed federation without hot reload which means that we won't reap some of the benefits you mentioned in your explanation.
If you could update the issue with your output that would help narrow down where any potential problem lies.
@garypen
root@apollo-router-7fd44dcc9b-z7rfj:/dist# cat /proc/1/cmdline /dist/router--config/app/configuration.yaml
This is what all of our pods have for
cmdline
.
@marc-barry
Thanks for the information. Ok, --hot-reload
isn't enabled, but you are getting the schema updates (which, perhaps confusingly, aren't controlled by the --hot-reload
flag). The --hot-reload
flag is watching for configuration changes.
I'm going to rename this issue to we are unable to disable schema refresh...
, since it more accurately reflects the actual issue.
I think we should consider adding the option to have the router download the schema once and only once during startup. Having the ability to prevent reloads is a dealbreaker for some, and it would be better if they were still able to use managed federation rather than jumping through hoops to achieve this.
Describe the bug
We have deployed Apollo Router on our Kubernetes cluster, serving a significant amount of traffic. We are using Apollo Router with managed federation and thus according to https://www.apollographql.com/docs/router/configuration/overview#environment-variables we have specified the following:
As part of our backend service deployment, we have a need to control when Apollo Router loads the new schema from managed federation. It is our experience that there is no way to disable the hot reload functionality in router even though we don't specify the https://www.apollographql.com/docs/router/configuration/overview#--hr----hot-reload argument and we even set
APOLLO_ROUTER_HOT_RELOAD=false
.To Reproduce Steps to reproduce the behavior:
values.yaml
you provide to the Helm chart install you specifyargs: []
so that the default args, which includes--hot-reload
isn't provided to the deployment: https://github.com/apollographql/router/blob/dev/helm/chart/router/values.yaml#LL22C5-L22C17.APOLLO_ROUTER_HOT_RELOAD=false
. Note, this is optional as we removed the command line argument from above and so this shouldn't be necessary but we do it anyhow as want to ensure it is disabled.reloading
andreload complete
which are reported as anINFO
log.Expected behavior
The hot reload functionality should be disabled and it shouldn't hot reload when it isn't enabled and/or it is explicitly disabled. There appears to be no way to disable it and one is opted into regardless not specifying
--hot-reload
.Output
You get info log messages which say
reloading
andreload complete
. We see a significant increase in CPU usage and request latency during hot reloading at our scale. We can see this happen when the hot reload occurs. We accept that hot reload behaves this way but we don't want it to happen at all.Desktop (please complete the following information):
Additional context
We have also set
APOLLO_TELEMETRY_DISABLED=true
to see if that has any impact. It didn't appear to do so.Below is an example of our configuration file (with some small things redacted):