Closed lblackstone closed 6 years ago
@lblackstone : Container Linux updates are rolled out over the fleet at a limited rate. Running update_engine_client -check_for_update
manually ignores the rate limiting and performs the update immediately.
Thanks @arithx!
Are there any public docs that cover the rate limiting? I found https://github.com/coreos/tectonic-forum/issues/150 after your comment, but couldn't find any details on how this is configured (client vs server side, etc).
The rate limiting is set on the CoreUpdate side, here's a doc for the server side https://coreos.com/products/coreupdate/docs/latest/getting-started.html#group.
It's also briefly mentioned in https://coreos.com/os/docs/latest/update-strategies.html#manually-triggering-an-update from the perspective of the client side.
Ok, as far as I can tell, we're not running CoreUpdate. These machines aren't part of any managed subscription, and updateservicectl
does not appear to be installed.
Is any sort of global rate limiting expected?
Also, I found another cluster exhibiting the same behavior, and noticed that neither one of them updated to 1688.4.0
either. The clusters have been running for several days, and that update has been available since March 27, so I would have expected them to have updated by now.
@lblackstone: CoreUpdate is what we also run for the public update server.
Global rate limiting is expected.
1688.4.0
was only available via the public update server for a small period of time due to the free magic
grub issue (more info here: https://groups.google.com/forum/#!topic/coreos-user/5ihE2cKuYck). Updates for 1688.4.0
were paused which did not allow even manual updates and eventually removed from the update server.
We're currently rolling out 1688.5.3 at a fairly low rate, so it's not surprising that your machines haven't all updated yet.
Great, thanks for clarifying, and the quick responses!
For future reference, is there any way to see current rate limits for the public update server?
Not to my knowledge, cc @sdemos
that is correct, rate limits are not exposed publically.
I think it would be useful to expose that info somehow, but that's probably out of scope for this issue.
Closing since the observed behavior was confirmed to be expected.
Issue Report
Bug
Container Linux Version
1632.3.0
/1688.5.3
Environment
What hardware/cloud provider/hypervisor is being used to run Container Linux?
HP ProLiant SE4255e
/OpenStack private cloud
/KVM
Expected Behavior
update-engine.service
should find available update and automatically upgrade.Actual Behavior
The available update was not detected until running
update_engine_client -check_for_update
manually.See below.
Reproduction Steps
???
Other Information
This node was part of a Kubernetes cluster that is running the Container Linux Update Operator v0.5.0
We noticed that one node in the cluster had upgraded successfully, but the other nodes were still running the previous version. I didn't see anything suspicious in the operator/update-agent logs, so I checked the
update-engine.service
on one of the nodes that had not upgraded.After manually triggering the update check on that node, it detected the available update, and the upgrade process completed normally. The remaining nodes in the cluster still are not detecting the available update.