GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
193 stars 90 forks source link

config-reloader Restarts Multiple Times on Boot #472

Open silvamerica opened 1 year ago

silvamerica commented 1 year ago

I've been investigating why the config-reloader pod occasionally restarts multiple times while booting, and I've narrowed it down to this line. I'm seeing the following in the logs:

Get \"http://localhost:19090/-/ready\": dial tcp 127.0.0.1:19090: connect: connection refused

Per the comment on line 73, it seems like the intent would be to continue polling in this situation, but instead the process exits and has to be restarted.

pintohutch commented 1 year ago

Hi @silvamerica,

Thanks for pointing this out!

The thought behind not tolerating the error was that it would be a non-recoverable issue with the http client or something. However, as it turns out, that's not the case - as you have highlighted.

Open PR at #474.

mathe-matician commented 9 months ago

@pintohutch to clarify, will we lose any metrics being collected on that node if we encounter this crash? Or is this just isolated to the config-reloader sidecar?

pintohutch commented 8 months ago

Hey @mathe-matician - you will not lose metrics. This is just the config-reloader restarting because prometheus isn't ready to load config yet (presumably due to startup delay).