Closed nightmareze1 closed 8 months ago
@odellcraig you do that via the release_version
@bryantbiggs Thank you.
For anyone using Terraform and eks_managed_node_groups you can specify using:
eks_managed_node_groups = {
initial = {
ami_release_version = "1.29.0-20240227" # this is the latest version as of this comment
name = "..."
instance_types = [...]
min_size = ...
max_size = ...
desired_size = ...
...
can you please fix the damn issue after half a year? Still happens with EKS managed nodegroup and AMI
amazon/amazon-eks-node-1.29-v20240522
Error in kubelet on node:
unexpected status from HEAD request to https://602401143452.dkr.ecr.eu-central-1.amazonaws.com/v2/eks/pause/manifests/3.5: 403 Forbidden"
migration to EKS halted here
Same error with amazon-eks-node-1.29-v20240315
failed" error="failed to pull and unpack image \"602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.5\": failed to copy: httpReadSeeker: failed open: unexpected status code https://602401143452.dkr.ecr.eu-central-1.amazonaws.com/v2/eks/pause/blobs/sha256:6996f8da07bd405c6f82a549ef041deda57d1d658ec20a78584f9f436c9a3bb7: 403 Forbidden"
Are the permissions on your node role correct per https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html? Specifically, does it have the AmazonEC2ContainerRegistryReadOnly policy?
The policy is attached.
Policy AmazonEC2ContainerRegistryReadOnly is attached here also.
Cant you just use a REAL public repo instead of this half baked half private/public repo in the configs and init scripts? cause hacking the bootstrapping script with public.ecr.aws/eks-distro/kubernetes/pause:v1.29.0-eks-1-29-latest
works - but only until reboot, cause init-scripts will always place this damn non working URL in /etc/containerd/config.toml
@korncola can you open a ticket with AWS support so we can look into the specifics of your environment?
thanks @cartermckinnon , will do that. But still: Why no true public repo?!
Did a cluster via terraform and GUI, triple checked policies. Also disabled all SCP. Still same error. Also nodegroups with AL2023 image or AL2 no success.
ECR Public is only hosted in a few regions; so we still use regional ECR repositories for lower latency and better availability. ECR Public also has a monthly bandwidth limit for anonymous pulls that cannot be increased; so if you're using it in production, make sure you're not sending anonymous requests.
[...] and better availability. [...]
yeah i see the availability in this and the other tickets...
ECR Public also has a monthly bandwidth limit for anonymous pulls that cannot be increased;
As i said above use a real public service... And AWS owns that service, so make it worth... This are bad excuse for this design decision. Sorry for my rant, but I don't get this decisions, when I look at the scripts with all the hardcoded account IDs to compose an ECR repo URL, with scripts in scripts in scripts, I mean come on, you can do better at AWS.
But as always in the end, I will have a certain typo or whatever on my side causing my ECR pull error and you will all laugh at me :-)
@korncola lets keep it professional. The best course of action is to work with the team through the support ticket. There are many factors that go into decisions that users are not usually aware of. The team is very responsive in terms of investigating and getting a fix rolled out (as needed)
yep you are right 👍 team here is very helpful and responsive, thank you for the support here! Will report when issue is resolved, so others can use that info.
If I understand correctly, the same or a similar (in that it will definitely occur over time) bug was perhaps reintroduced/introduced? So should it be advised to not upgrade nodes? Or is this a separate issue (e.g. anonymous pulls)?
No sign of the issue on older version (1.29.0-20240202)
No, at this point we don’t have evidence of a new bug or a regression.
I’m going to lock this thread to avoid confusion, please open a new issue for follow-ups.
AMI: amazon-eks-node-1.29-v20240117
1 day after upgrading EKS to 1.29