cpan-testers / cpantesters-api

An API in to data held by CPAN Testers: Test reports and CPAN uploads
Other
4 stars 4 forks source link

Lots of "503 Service Unavailable" #40

Closed haukex closed 3 months ago

haukex commented 4 months ago

For example, I just tried accessing https://api.cpantesters.org/v3/release/dist/File-Replace-Inplace/0.18 from my browser, but it gives the following response.

HTTP/2 503 Service Unavailable
server: Varnish
retry-after: 0
content-type: application/json
accept-ranges: bytes
date: Tue, 30 Apr 2024 08:31:14 GMT
via: 1.1 varnish
x-served-by: cache-fra-eddf8230116-FRA
x-cache: MISS
x-cache-hits: 0
x-timer: S1714465874.239451,VS0,VE36
content-length: 80
X-Firefox-Spdy: h2

{
    "message": "The API is temporarily unavailable. Please try again later."
}

Because I have a script that does requests like that automatically (https://github.com/haukex/Badge-Simple/blob/master/misc/cpantesters.pl), at first I thought it might be a rate limiting issue (I was doing 13 requests at a time, once daily), so I added delays between the requests, but I am not longer sure that's the problem.

For example, right now, from my browser, I can access https://api.cpantesters.org/v3/release/dist/File-Replace/0.18 just fine, but at the same time, https://api.cpantesters.org/v3/release/dist/File-Replace-Inplace/0.18 is giving the response as shown above. Only after a bunch of retries does the latter URL work.

EDIT: This even happened just now when accessing http://api.cpantesters.org/docs/?url=/v3 - I had to reload the page a bunch of times to get the documentation to display.

preaction commented 4 months ago

TL;DR: I'll start looking into this, and make some improvements in the area while I'm at it.

My guess is this is Fastly's response, so maybe I need to adjust the Fastly settings to increase the timeout. I also just recently (over the PTS last week) greatly reduced the load on one of the two CPAN Testers servers, so I might be able to start load balancing the API requests between them. It's also possible I'm already doing that and one of the two are down, in which case I need maybe to fix my monitoring and/or set up some automated health checks.

preaction commented 4 months ago

Okay, there is load balancing, and the one node was down due to a bug (I upgraded Mojolicious and was using a reserved stash word). It should be back to working, but let me know if you encounter a problem.

haukex commented 4 months ago

Thank you! The most recent run looks much better, no 503s so far. I'll keep an eye on it over the next few weeks.

Is there rate limiting on the API, or a suggested rate, or is it okay if I just fire off my 13 requests one after the other?

preaction commented 4 months ago

There is, as of yet, no rate limiting on the APIs that I can think of. If accidental DoS becomes a problem, I'll likely add some at the Fastly level. That said, I would expect to be able to handle, on average, one simultaneous request for each of an author's dists (for the per-release summary API at least), which probably means around 50 concurrent requests (or maybe a conservative ceiling of like 100 req/min.) Anything less than that I would consider a problem on CPAN Testers' end to address.

TL;DR: 13 requests is just fine :) I'm honestly just happy some folks are using the "new" API ;)

Edit: Likely if we start having problems I'll begin addressing them by discussing some SLAs for each kind of API (summaries, reports, and details).

haukex commented 4 months ago

Thanks! I've reduced the delay between requests to 100ms and still no 503s. I'll keep an eye on it for the next few weeks but at the moment things look good.

haukex commented 3 months ago

No more 503s in the past ~4 weeks, so I think it's safe to call this fixed. Thanks again!