crflynn / pypistats.org

PyPI downloads analytics dashboard
https://pypistats.org/
139 stars 10 forks source link

Data import seems to have not run yesterday? #1

Closed njsmith closed 4 years ago

njsmith commented 6 years ago

Hello! You probably already know this, but just in case: it looks like the live site doesn't have any data for 7-28, even though we're almost to the end of 7-29.

(Very cool site by the way!)

crflynn commented 6 years ago

Thanks for pointing this out. Looks like the instance is running out of memory (running on a t2.micro). I'm guessing the linehaul issue was quite significant, and now with less data loss the google bigquery daily tables are 10-12 GB now. I'll have to upgrade the instance; I'll get it backfilled asap.

njsmith commented 6 years ago

The linehaul issue was extremely significant, yeah. I think it'll be a week or two until we know for sure what the new tables look like, since the new linehaul was deployed just before the weekend, which is the low-traffic part of the week... the table for 7-27 is actually 18 GB.

If free-tier resource limits are being an issue, we could ask @ewdurbin and @dstufft about whether it would make sense for pypistats.org to move onto PSF infrastructure. I don't think the PyPI maintainers have spare energy to build or maintain a site like this, but it definitely fills an important need, and it sounds like the needed infrastructure is pretty modest.

crflynn commented 6 years ago

In hindsight I should probably be running the ingestion job in a Lambda rather than alongside the server...

I have another 8 months within the AWS free tier. The actual costs would be about $50/month. The aggregate data takes up only about 8 GB so far in a postgres db (20 GB cap), so altogether it's pretty lightweight.

If it makes sense to migrate to PSF at some point I would be happy to do so. I haven't touched this project in a while other than to regularly extend the retention time, but if there are any other metrics that might be interesting for you or other developers to see I can add them to my TODOs.

dstufft commented 6 years ago

Let me just say, the site seems like a super useful tool and I'm glad it exists. If you ever do decide you want it to live elsewhere, or that you might even be interested in integrating the functionality into Warehouse, let me know and we'll see what we can do.

crflynn commented 6 years ago

Glad to hear that you find it useful. I think that eventually integrating into PSF/Warehouse is probably a good idea for the long term. I could work on it myself to absolve the maintainers of the workload. Who would be the primary point of contact with respect to implementation details?

In the meantime I'll try to get familiar with the Warehouse codebase.

leviable commented 6 years ago

There are some Issues for these: https://github.com/pypa/warehouse/issues/699 https://github.com/pypa/warehouse/issues/787