Closed emanuelflp closed 5 days ago
The 1.26.0 release of Bottlerocket included a change to restrict system services from mapping memory as both writable and executable (https://github.com/bottlerocket-os/bottlerocket-core-kit/pull/158).
Although intended to apply only to the host software, which does not need this capability, the restriction also erroneously applied to applications running inside containers. Software relying on just-in-time (JIT) compilation, such as Java or NodeJS, often needs to mark memory as both writable and executable, and this change caused pods running Java and NodeJS applications to fail.
To mitigate the impact, the 1.26.0 release has been rolled back and 1.25.0 is now marked as latest.
Please note that the bottlerocket-1.26.0-based AMIs on AWS (e.g. bottlerocket-aws-k8s-1.30.x86_64-v1.26.0-85f0d68c
which bit us this morning) are still active/available so this issue will still be impacting users.
It might be worth thinking through how to propagate retracted releases downstream to whomever publishes these AMIs and has the ability to retract them as well..
Or perhaps it's worth considering that the best way to roll back a release in practice may be to cut a new release (with a higher version number) that will supercede the old/bad version in all downstream systems.
If you had done that, we wouldn't have had to version lock to an old version (and miss out on security updates until we unlock). Or perhaps we wouldn't have had the issue at all, since our badly-timed upgrade was 10 hours after you already retracted the release here in the source repo (but the AMI remained/remains active).
@jemc, thank you for the suggestions. Bottlerocket does provide a mechanism for choosing AMI IDs that you might consider: AMI IDs as public SSM parameters (see, for instance, the QUICKSTART-EKS.md file in this repo for details). When a release is published, the the latest
SSM parameter is. updated to the new AMI ID, and if a release is rolled back, that parameter is changed to the previous AMI ID. This change can propagate very quickly, and in your particular case, if you were updating ten hours after the issue was found, you would have seen the latest
SSM parameter for the rollback (previous version) AMI ID. I hope this helps, going forward.
Thanks for the info - I'll take a look!
Closing this issue as the fix for this (referenced above) was released in Bottlerocket v1.26.1: https://github.com/bottlerocket-os/bottlerocket/releases/tag/v1.26.1
Image I'm using: bottlerocket-aws-k8s-1.30-aarch64-v1.26.0-85f0d68c
What I expected to happen: All nodes using the latest Bottlerocket AMI(1.26) should be able to run NodeJS based pods without any issues.
What actually happened: When Karpenter rolled out new nodes using the latest Bottlerocket AMI, all the NodeJS based pods placed in the new nodes are crashing:
With the below errors:
Workaround: Rolling back the nodes to the previous version v1.25.0 fixed the issue.