buildpacks / registry-api

API for searching and reading the Buildpack Registry
Apache License 2.0
3 stars 8 forks source link

Login to DockerHub while indexing to access higher rate limts #114

Closed joshwlewis closed 1 year ago

joshwlewis commented 1 year ago

The indexer is running into rate limits from DockerHub like this:

at=handleMetadata level=warn msg='failed to fetch config' entry='heroku/java-function@0.3.34' reason='GET https://index.docker.io/v2/heroku/buildpack-java-function/manifests/sha256:f0b285a99116ab20a0cdb9c6c4f60dcd98867cf4be2c416813d13148d0bdaf15: TOOMANYREQUESTS: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit'

This causes the registry api and website to be out of date, showing old information for buildpacks published to DockerHub.

This PR writes DOCKERHUB_USER and DOCKERHUB_TOKEN to ~/.docker/config.json, so that the indexer is able to login to the DockerHub registry prior to pulls. A free authenticated user increases the pull rate limit from 100 pulls per 6 hour window to 200 pulls per 6 hour window. However, if the authenticated user is a member of a Docker Team, like https://hub.docker.com/u/buildpacksio, the rate limit is increased to 5000 per day.

Discussion here: https://cloud-native.slack.com/archives/C032YE21V1T/p1687338079300959.

Fixes #113.

joshwlewis commented 1 year ago

I've deployed this to staging. It was able to do a full re-index. I didn't see any docker pull rate limits. Though, it probably will hit rate limits later, since it'll keep re-indexing every 5 minutes.

edmorley commented 1 year ago

@joshwlewis Thank you for working on this!

I see the indexer runs in a loop, and 5 mins after the last run is triggered again: https://github.com/buildpacks/registry-api/blob/80f586fee0b869a6d1c59fc4ba61fc1522926a2e/bin/indexer#L24

Since there approx 400 non-OSS Docker Hub URIs, and presuming each run takes 5 mins, that means in 6 hours roughly 14,000 Docker Hub requests will be made, which is above the 5000 paid account limit?

edmorley commented 1 year ago

Would one option be to skip the re-run each 5 mins if there are no new git commits in the index repo?

joshwlewis commented 1 year ago

Since there approx 400 non-OSS Docker Hub URIs, and presuming each run takes 5 mins, that means in 6 hours roughly 14,000 Docker Hub requests will be made, which is above the 5000 paid account limit?

Yeah, I agree, we're not all the way there. And the 5000 limit is measured per day, not every 6 hours.

I had plans to fix the number of pulls we're doing, but probably in a different PR.

Would one option be to skip the re-run each 5 mins if there are no new git commits in the index repo?

Yeah, this could work. I also like the idea of not re-pulling images we've already indexed. Could possibly do both.