docker / hub-feedback

Feedback and bug reports for the Docker Hub
https://hub.docker.com
233 stars 48 forks source link

Automation blocked from using the docker hub search API #2252

Open visit1985 opened 2 years ago

visit1985 commented 2 years ago

Problem description

We use an automation based on docker-hub-rss, to monitor our base images for updates and create issues in our ticket system to verify the changes.

Since Jun 29th, our CI system gets blocked from using the docker hub search API for some limited amount of time (<1 day) after calling it via docker-hub-rss.

As this is an essential part of our software lifecycle management, can you please unblock this type of traffic again, tell us what conditions we need to meet in order to not get blocked, or help us to find an alternative notification method?

Debug Information

  1. Use any browser to access https://hub.docker.com/_/alpine
  2. Run docker run -it --name "docker-hub-rss" -p 127.0.0.1:3001:3000 --rm theconnman/docker-hub-rss:latest
  3. Tries to access http://localhost:3001/_/alpine.atom?includeRegex=%5E%5B0-9%5D%2B%5C.%5B0-9%5D%2B%5C.%5B0-9%5D%2B%24 end in ERR_EMPTY_RESPONSE
  4. Tries to access https://hub.docker.com/_/alpine again, show a 404 page
milosgajdos commented 2 years ago

@visit1985 we are not aware of any blocking done by Docker Hub.

We've noticed the docker-hub-rss uses an undocumented Hub API endpoint to read tags.

More so, it would appear the way the tool handles the pagination doesn't check the returned responses code which, when there are no more results to be returned, returns 404.

I'd recommend reaching out to the tool authors and ask them to update this piece code https://github.com/TheConnMan/docker-hub-rss/blob/fb40dabe9e7e82f4d65f8799c76c2462acc8d8f1/api/%5Busername%5D/%5Brepository%5D.js#L44-L46 so that it handles the case where at some point when going through paginated results it encounters 404 it should stop.

We're also not aware of any issues related to visiting https://hub.docker.com/_/alpine.

binman-docker commented 2 years ago

I would also note that I don't see any sort of caching in there (though I'm no Javascript programmer). If this fires requests to Hub any time someone tries to load the RSS feed, and that happens often, you could easily find yourself hitting the abuse rate limits.

That said, I think the evaluation window for those limits is something like ten minutes. If your IP is locked out for long periods of time, that means it's continuing to make a bunch of queries.

I would recommend looking at something like Diun for better notifications with cron scheduling: https://github.com/crazy-max/diun

visit1985 commented 2 years ago

I would recommend looking at something like Diun for better notifications with cron scheduling: https://github.com/crazy-max/diun

Thanks for the hint.

There is caching in package docker-hub-api which is use by docker-hub-rss.

After digging a bit deeper, I see a HTTP 429 {"detail": "Rate limit exceeded", "error": false} returned from the API. Maybe the amount of tags for one of the repos is to big to query all pages without a delay. I will continue debugging tomorrow and try to fix the pagination on HTTP 4xx.

@milosgajdos Can you tell me what is the rate limit for this API?

binman-docker commented 2 years ago

There is caching in package docker-hub-api which is use by docker-hub-rss.

Ah, perfect. 5 minutes should be fine in this application, though could probably be increased depending on your needs.

After digging a bit deeper, I see a HTTP 429 {"detail": "Rate limit exceeded", "error": false} returned from the API

Thanks for the error detail! Digging through the code, it looks like the limits for that API are currently set at 600 requests per minute for authenticated requests, and 180 for unauthenticated. On non-limited requests, you should be able to see X-RateLimit-* headers returned with exact details.

visit1985 commented 2 years ago

Is it possible that the JSON response for errors changed some days ago? Because the way docker-hub-api detects errors seems not to work if I query a non-existent page: {"errinfo":{"namespace":"library","repository":"alpine"},"message":"object not found"}, while it does for a rate limit exception: {"detail": "Rate limit exceeded", "error": false}.

milosgajdos commented 2 years ago

The error message has changed indeed!