Work around docker hub rate limits

Silex commented 4 months ago

I'm getting hit by https://www.docker.com/increase-rate-limits/ (https://github.com/Silex/docker-emacs/actions/runs/9769943725)

Basically I "pull too much" when building the images.

1st option: move to another registry (https://stackoverflow.com/questions/65806330/toomanyrequests-you-have-reached-your-pull-rate-limit-you-may-increase-the-lim), which I'm not a fan of because well, official images tends to be on docker hub.

2nd option: I wonder wether I could use some caching like https://github.com/marketplace/actions/docker-cache

3rd option: rate limit the build process so the ~200 pulls are spread over 6h sounds... but this sounds silly.

If anyone has insights about how to tackle this I'm all hear 😉

pataquets commented 4 months ago

If it's "just for your local", auth'ing your local Docker daemon raises DH's limits for auth'ed users. Not sure if you're or not. On Gitlab, you do this: https://docs.gitlab.com/ee/user/packages/dependency_proxy/#docker-hub-rate-limits-and-the-dependency-proxy. Github might have something similar (or may not). Optimizing Dockerfile steps' order might improve build layer caching, which might reduce pulling needs. Also, a search for "registry proxy" or "registry cache" on DH, turns up several results, but I've used none of them. Maybe there is sthg oficial-ish around, but not sure, thou. That's all for now, off the top of my head. I'll get back if anything else comes to mind.

Silex commented 4 months ago

Thanks. Yes I'm authed otherwise the github actions would not be able to push the images to the registry. Here's how each image are built/pushed: https://github.com/Silex/docker-emacs/blob/master/.github/actions/build/action.yml

I just pushed something that sets the max jobs to 1 at a time. Maybe it'll be enough for now... but I doubt it. To pass under then 200 pulls in a 6h period I'll also need to add some sleep() 😞

But yes, a registry proxy that is updated once in a while would work, not sure how you tell docker to use that proxy tho.

Silex commented 4 months ago

Ah, just found this https://engineering.deptagency.com/how-to-speed-up-docker-builds-in-github-actions:

          cache-from: type=gha
          cache-to: type=gha,mode=max

Sounds like the way to go, will give it a try.

Silex commented 4 months ago

Meh.

GitHub Action cache has a current limit of 10 GB. Large Docker images can quickly outgrow this size limitation.

But the page mention using a registry cache and there's a Github Container Registry.

I guess I could build & cache my images to this, and then only push to the docker hub.

That requires some refactor and more secret token tho, not something I have time for at the moment.

Will look into it beginning of august.

Silex commented 4 months ago

Actually this won't fix the problem of FROM alpine of my Dockerfiles.

I really need a registry proxy cache. Will need to google more.

pataquets commented 4 months ago

I've seen on the results excerpts of a cursory google search (not clicking any link) that there are some "dummy" proxies made just from standard HTTP services (squid, nginx, etc.), no special logic involved, apparently. This might simplify your solution.

(Original reply, as intended for a previous comment, before you posted th Alpine update) Glad to hear that @Silex! If Gh space limits are too restrictive (and also apply to both image registry and caching storage), maybe combinig them w Gitlab's Docker registry as proxy, you can create some sort of 2-tiered cache. Sounds cumbersome (and it might be), but perhaps sometime along the path it's the only solution. Also, give it a thought on if building some "common base image", maybe refreshed via cronjob might help optimizing usage quota, but that's more dependent on the flow (which I'm not familiar). finally, don't forget to check if you qualify for some freebie/grant: https://docs.github.com/en/billing/managing-the-plan-for-your-github-account/discounted-plans-for-github-accounts, just for completeness. Feel free to share details as you progress, and I'll be happy to help on whatever I can.

Silex commented 4 months ago

@pataquets: thanks, you can help figure out how I should use https://github.com/renovatebot/renovate/issues/9958 which apparently allows to use gitlab's dependency proxy.

The goal is not to modify the dockerfiles, but as a plan B I see we could also do FROM gitlab.example.com/groupname/dependency_proxy/containers/alpine:latest.

pataquets commented 4 months ago

Hi, @Silex. Not 100% sure of from where are you planning to use Gitlab's Dependency Proxy. Luckily, I use it from Gitlab's CI. I'm guessing you'll be using it from Github Actions. From inside Gitlab, I just add the group or project-specific env vars pointing to the DP as the image prefix. Thus, when built from local, you just pull from standard Docker Hub. Example: ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}alpine:latest. You'll have to find the full registry URL for your account/group/project associated DP registry. Another story is authentication, since Gitlab's DP requires Docker daemon to be logged/auth'ed to it. From inside Gitlab'd CI, this is mostly transparent. From another CI, you'll need to create an access token with DP read permission, which will work as user/pass for Docker daemon to DP from outside Gitlab. Check out this for more info: https://docs.gitlab.com/ee/user/packages/dependency_proxy

Silex commented 3 months ago

Continuing in #106

Silex commented 3 months ago

What I did with ghcr.io is better, but I still hit docker hub limits sometimes, because of FROM alpine and FROM debian.

Using a proxy cache might help for those... but at this point I'm considering ditching docker hub.

Or switching just these images to ghcr.io, but that means I'll need to maintain ghcr.io/silex/ubuntu:latest etc. This could be a preparation step in the ci tho.

Silex commented 3 months ago

Actually when looking at the CI errors, it becomes obvious I'll need to change every FROM to the ghcr.io equivalent.

Because even tho it has the cache on ghcr.io, it still has to pull the base image from docker hub.

Silex commented 3 months ago

Made most of the images have "FROM ghcr.io", will see how this affects pull limits.

If not sufficient, will also have nix, alpine and debian be on ghcr.io.

It's a shame there's no public mirror of docker hub in ghcr.io

pataquets commented 3 months ago

Wouldn't help to also use your DH credentials for increased pull quota when pulling the steps' images? https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idcontainercredentials

Silex commented 3 months ago

@pataquets I already use them.

Recent fix seems to be enough, closing for now.

Silex / docker-emacs

Work around docker hub rate limits #103