Open 20k-ultra opened 3 years ago
While investigating this issue I saw at least 3 devices from separate customers that would not apply a new target state. The Supervisor state showed update_pending as true but the logs would only show applying target state and then nothing. This is in addition to the above 503 error followed by healthcheck fail and restart of Supervisor which all 3 devices have shown.
[20k-ultra] This issue has attached support thread https://jel.ly.fish/a83c538a-e82f-434b-bb50-e3f98065731c
[20k-ultra] This issue has attached support thread https://jel.ly.fish/4ff915f6-f9ce-405b-a071-701853ef6349
[20k-ultra] This issue has attached support thread https://jel.ly.fish/44cbdeef-93c4-4941-92fa-e47c0eda6c8f
we should be able to make this call throw an exception to see if the healthcheck fails and new release is not installed
I tested the above on my device where I made the Supervisor think it was getting only 503s from the API and that did not cause the Supervisor the fail healthchecks and restart.
This theory was flawed because of my understanding how the healthcheck variable cycleTimeWithinInterval
is updated.
https://jel.ly.fish/a83c538a-e82f-434b-bb50-e3f98065731c was resolved by removing the preloaded application config file and allowing the device to pull all the images from the cloud.
Still a theory but cycleTimeWithinInterval is not within the needed interval and I think it's because the cloud API returned a 503 which prevents the Supervisor from updating its value for cycleTimeWithinInterval.
This basically means that if the cloud API goes down then a lot devices will begin to restart their Supervisor.