digitalocean / clusterlint

A best practices checker for Kubernetes clusters. 🤠
Apache License 2.0
542 stars 45 forks source link

Error on images from docker.pkg.github.com #114

Closed WyriHaximus closed 3 years ago

WyriHaximus commented 3 years ago

Resolves #113

Since Kubernetes 1.20 containerd is used instead of Docker as the container runtime. Containerd is due to protocol version differences/support/mismatch unabled to pull images hosted at docker.pkg.github.com. The new check in this commit will error when it finds an image from that registry, and suggests to use ghcr.io, which is also hosted and operated by GitHub, and the successor of docker.pkg.github.com.

Refs:

WyriHaximus commented 3 years ago

Meh moving it to doks didn't go as planned, will have another look later today: Screenshot from 2021-02-08 13-38-46

WyriHaximus commented 3 years ago

Managed to work around it, but would rather somehow used the existing file instead of duplicating it.

adamwg commented 3 years ago

:wave: Thanks for taking this on @WyriHaximus!

Two high-level suggestions:

  1. I'd suggest that we put this check in a new group, containerd, which would be applicable to any cluster using containerd as its runtime, and also include it in the doks group. The directory layout is a little arbitrary (and doesn't actually impact which group(s) checks end up in in the tool) so feel free to move/refactor as needed.
  2. We already fetch the Nodes from the cluster, so the new check could check whether any nodes in the cluster are using containerd (via the NodeSystemInfo.ContainerRuntime field). This would be better than using the cluster/node version, since (as you and Timo discussed) non-DOKS 1.20 clusters coudl still be using docker.

I'll take a detailed look at the change as well and leave any suggestions there, but wanted to write down these high-level things first.

WyriHaximus commented 3 years ago
  1. I'd suggest that we put this check in a new group, containerd, which would be applicable to any cluster using containerd as its runtime, and also include it in the doks group. The directory layout is a little arbitrary (and doesn't actually impact which group(s) checks end up in in the tool) so feel free to move/refactor as needed.

Moved it to the containerd group.

  1. We already fetch the Nodes from the cluster, so the new check could check whether any nodes in the cluster are using containerd (via the NodeSystemInfo.ContainerRuntime field). This would be better than using the cluster/node version, since (as you and Timo discussed) non-DOKS 1.20 clusters coudl still be using docker.

Cool, but that is for current nodes rights? And not the nodes that will be spun up correct? So if you're migrating from docker to containerd this specific check wouldn't know you're about to do that as the currently ContainerRuntime it will get is docker and not containerd.

I'll take a detailed look at the change as well and leave any suggestions there, but wanted to write down these high-level things first.

Looking forward to it :+1: .

adamwg commented 3 years ago
  1. We already fetch the Nodes from the cluster, so the new check could check whether any nodes in the cluster are using containerd (via the NodeSystemInfo.ContainerRuntime field). This would be better than using the cluster/node version, since (as you and Timo discussed) non-DOKS 1.20 clusters coudl still be using docker.

Cool, but that is for current nodes rights? And not the nodes that will be spun up correct? So if you're migrating from docker to containerd this specific check wouldn't know you're about to do that as the currently ContainerRuntime it will get is docker and not containerd.

Ah, yes, good point :man_facepalming:

WyriHaximus commented 3 years ago

Overall this looks good to me. Please also add the new check to checks.md, preferably with a link to the relevant containerd issue and GitHub migration doc.

Done :+1:

WyriHaximus commented 3 years ago

One question tho, why isn't circleci running for this PR?

adamwg commented 3 years ago

That's what I was wondering :smile: @varshavaradarajan Any idea?

varshavaradarajan commented 3 years ago

The webhook payload isn't being delivered. I tried redelivering it as well. It's failing with 400. @WyriHaximus - Can you rebase and push again if you haven't?

WyriHaximus commented 3 years ago

@varshavaradarajan done :+1:

adamwg commented 3 years ago

Not sure what's going on with CircleCI here, but everything it runs is passing locally for me so I think we're fine to merge.

WyriHaximus commented 3 years ago

Might be a permissions issue as I don't have write access to this repo and the merge ran fine.

varshavaradarajan commented 3 years ago

We've had external contributors before and circleCI has run fine. The webhook payload isn't getting delivered at all for this. :( We may need to remove and add the circle ci webhook if the problem persists. But looks like all the tests passed on master after the merge.

MattIPv4 commented 3 years ago

Hey, @WyriHaximus - Thanks a ton for this sweet PR! 😄

Would you please shoot me an email when you get a chance? mcowley at digitalocean dot com 🎉

WyriHaximus commented 3 years ago

@MattIPv4 Done 👍