crflynn / pypistats.org

PyPI downloads analytics dashboard
https://pypistats.org/
138 stars 10 forks source link

502 Bad Gateway #26

Closed DanielGoldfarb closed 4 years ago

DanielGoldfarb commented 4 years ago

This is probably more of a hosting issue but I don't know where else to report it, and don't have the skills myself to debug it. I have noticed for the past approximately 3 to 5 days (can't remember exactly when it started but at least three days ago) the site pypistats.org is very slow. For most requests, it takes a very long time to come back; sometimes it times-out completely, returning "502 Bad Gateway" with "nginx/1.12.1" underneath.

DanielGoldfarb commented 4 years ago

I just noticed this issue reported previously. Is there no way to throttle API requests, and/or detect a huge number of API requests from the same client and respond with a failure and a message about getting the data directly from BigQuery instead?

DanielGoldfarb commented 4 years ago

If the problem really is what was reported previously, that is, someone is hammering the API when they can and should be going directly to BigQuery instead, it is definitely possible to throttle the rate of requests and prevent such inadvertant denial of service attacks.

I don't know exactly how to implement it, but I do know, for example, that I use the Alphavantage API sometimes to get market data. The API requires me to provide a uniquely generated key along with my requests (and obtaining a key requires an email address). For the free version of the API, if the same key makes 5 requests in under a minute, then all further requests immediately return an error for the remainder of the minute (after which another 5 requests are permitted in the next minute. This greatly limits the load on the servers. (The paid version also throttles, but allows a faster request rate). I'm sure most, if not all, of these API's do some kind of throttling to prevent [even unintended] denial of service.

If someone reading this knows how to implement such a throttle, it will certainly be appreciated if you can do so and provide a pull request. Or if you can provide me with enough information/examples of similar code then I may be implement it myself.

On the other hand, if this really just some sort of hosting/server issue, I'm not sure how to even being to investigate and/or fix that.

DanielGoldfarb commented 4 years ago

This appears to have gotten better starting yesterday afternoon. So the problem lasted about a week. I still think relying on the kindness of API clients to behave well, and not hammer the API, is not a healthy software approach. There needs to be some kind of a throttle built into the API itself. Will leave this open and see if/when it happens again.

crflynn commented 4 years ago

@DanielGoldfarb thanks for bringing this to my attention. I've just deployed an update to the application which has IP-based rate limiting as you suggested. I'm hoping that this will mitigate similar situations in the future.

Based on the application logs, there was a PHP client that was repeatedly fetching recent download counts for official python packages associated with cloud providers, namely AWS, Azure, and Aliyun (Alibaba cloud). There are about 100 or so packages for these cloud providers, and so the client was requesting all of them several times per minute, resulting in server timeouts.

FWIW I built this project in about a week between jobs a while back and haven't really touched the code since then. It's admittedly in rough shape and I'd like to improve it so that others can run it locally and contribute. Improvements have been on my TODO list for a while, and I'll make a more serious effort when I have more free time.

DanielGoldfarb commented 4 years ago

@crflynn Thanks!

jbe456 commented 3 years ago

Since at least yesterday https://pypistats.org/ is not available. It returns a "502 Bad Gateway" error:

image

Not able to ping the server neither.

ζ ping pypistats.org                                                                                                                   
PING pypistats.org (3.234.152.95) 56(84) bytes of data.
^C
--- pypistats.org ping statistics ---
104 packets transmitted, 0 received, 100% packet loss, time 104373ms