aclark4life / vanity

Get package download statistics from PyPI
GNU General Public License v2.0
66 stars 14 forks source link

Stats broken since January 2016 #22

Open Themanwithoutaplan opened 8 years ago

Themanwithoutaplan commented 8 years ago

This is a minor niggle but it looks like vanity is not getting any updated statistics since about January. So for example vanity openyxl is confidently telling me that the package has never been downloaded.

aclark4life commented 8 years ago

@Themanwithoutaplan Yep, I think this is a PyPI issue.

SmokinCaterpillar commented 8 years ago

Is there anything you can do about it? I like vanity. Outdated statistics, however, make it quite useless :-)

aclark4life commented 8 years ago

@SmokinCaterpillar I like it too! We need to ask @dstufft or someone from @pypa to help.

Themanwithoutaplan commented 8 years ago

I think Donald is concentrating on getting Warehouse up to replacing PyPI. Should be more reliable once that's done.

dstufft commented 8 years ago

As part of Warehouse I've been working on a new stats pipeline that should both be way more robust and provide a lot more insight into downloads.

aclark4life commented 8 years ago

@Themanwithoutaplan @dstufft Any ETA on Warehouse? Might be worth fixing whatever annoyance has broken stats again at least once more to get us through…

Themanwithoutaplan commented 8 years ago

I think Warehouse is pretty close to being ready. Nobody likes touching the PyPI code and, given that it's been broken since January, I don't think another few days or weeks really matter.

Warehouse has a much clearer (and better) code base that will hopefully make it easier to maintain and more reliable. And help to add features.

aclark4life commented 8 years ago

@Themanwithoutaplan Great! Nope, another few days or week don't really matter. Months on the other hand …

ryukinix commented 8 years ago

They was talking about disable the stats because is distorted (mirrors counts and so on). Anybody can explain to me what is the Warehouse?

aclark4life commented 8 years ago

@ryukinix Ah, thanks for the cross ref. Warehouse is: https://github.com/pypa/warehouse

ryukinix commented 8 years ago

Oh, nothing, thanks you about that nice tool! Is a little sad doesn't works now, but is not your fault. xD

Warehouse looks interesting! We have some estimative when this will works in production? Would be nice have the vanity working again.

aclark4life commented 8 years ago

@ryukinix According to @Themanwithoutaplan "pretty close to being ready" … and we should only have to live with broken stats "another few days or weeks". Practically speaking though, since it's a (much appreciated) volunteer effort, I would be happy if it happened sometime in 2016, period.

aclark4life commented 8 years ago

https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html

dstufft commented 8 years ago

Just to be clear. PyPI isn't using this data yet but it will be.

Sent from my iPhone

On May 25, 2016, at 8:39 AM, Alex Clark notifications@github.com wrote:

https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

aclark4life commented 8 years ago

@dstufft Yeah understood, thanks! Presumably some aggressive vanity user could start consuming it then add support to vanity :-)

Themanwithoutaplan commented 8 years ago

whistles and looks at his shoes.

aclark4life commented 8 years ago

Is this fixed? I'm seeing stats again …

screenshot 2016-06-20 16 24 20
yotammanor commented 8 years ago

Did you consider moving to using the BigQuery dataset, for the moment?

(As suggested here )

aclark4life commented 8 years ago

Yep, suggested above too. Updating vanity to use the BigQuery data set is possibly a way to get old "missing" data back.

aclark4life commented 7 years ago

Is it safe yet to remove the "stats broken" message from vanity? If so, I'll close this and make a new release.

noxdafox commented 7 years ago

It seems stats are broken again.

requests-2.12.1-py2.py3-none-any.whl    2016-11-16       624953
              requests-2.12.2.tar.gz    2016-11-30            0
requests-2.12.2-py2.py3-none-any.whl    2016-11-30            0
              requests-2.12.3.tar.gz    2016-12-01            0
requests-2.12.3-py2.py3-none-any.whl    2016-12-01            0
aclark4life commented 7 years ago

@noxdafox I think they've been broken since January, or at least not working consistently…

dstufft commented 7 years ago

Sorry, I've had a lot more higher priority items. I would suggest using the BigQuery database instead of the API, although that doesn't (and can't, since some of that data simply doesn't exist anymore) get a cumulative count of downloads past a certain date. Currently that date is early 2016, but once I am able to backfill data it will be past a Jan 2014 date.

Themanwithoutaplan commented 7 years ago

@dstufft that would work for me. From a library developer's perspective I'm mainly interested in what's been happening recently: are people updating so I can kill old stuff?

dstufft commented 7 years ago

This may also be helpful: https://langui.sh/2016/12/09/data-driven-decisions/

nschloe commented 7 years ago

@dstufft I'm reading there:

Queries are charged against your account, but you get 1TB free per month and cached queries won't count against it.

Does this mean vanity will either have to ship with someone's personal credentials or ask the user to fill in their own credentials in a local config?

dstufft commented 7 years ago

@nschloe Yes.

nschloe commented 7 years ago

Sounds like this is end of easy-to-get stats on Python projects then. Too bad.

Is there a download stats section planned for warehouse?

dstufft commented 7 years ago

I don't believe possessing a Google account to be a significant barrier to entry to accessing statistics. It is certainly more of a barrier than completely unauthenticated, but not much IMO.

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

nschloe commented 7 years ago

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

Yes, that's what I meant; just a simple "download count in the last 30 days" or something along those lines. Something to brag about. :wink:

dstufft commented 7 years ago

Yea something like that, though it is fairly low on my list of priorities since (A) it's non trivial to implement and (B) BigQuery is available.

nschloe commented 7 years ago

I'm getting fairly reasonable numbers out of vanity again. Has something been silently fixed?

piem commented 7 years ago

hi there,

it seems not everything was fixed:

aubio_vanity

at least one person downloaded aubio 0.4.4 (me :-) ), some time ago already.

cheers, piem

MartinPyka commented 7 years ago

is there any alterantive to vanity?

aclark4life commented 7 years ago

@MartinPyka Not that I know of…

nschloe commented 7 years ago

Again, I'm getting reasonable numbers for various projects. Has this been silently fixed?

Themanwithoutaplan commented 7 years ago

Could be related to pypi having switched to Warehouse even though this is still not quite finished.

aclark4life commented 7 years ago

Going to try and tackle this one on Aug 5 at this event:

If anyone has any tips, please feel free to post them here (I know nothing about BigQuery going in.)

Themanwithoutaplan commented 7 years ago

Hi Alex, haven't worked with it myself but it's essentially a JSON API. httparchive is switching to it so you might be able to get some of an idea of how it works from that code, though it's all JS. One example is here http://jsfiddle.net/rviscomi/1r6dpctd/ if you look at the source.

I think the biggest problem will be whether you need to use credentials to access the data. If so you'll need to implement some kind of proxy somewhere. Based on the above example this may no longer be the case for public data sets. wget https://storage.googleapis.com/http-archive-beta.appspot.com/bytesJsTimeseries.json.

Best of luck!

ofek commented 7 years ago

@MartinPyka This is what people are using now https://github.com/ofek/pypinfo if you still need an alternative

aclark4life commented 7 years ago

@ofek Nice! Good to know this project exists. (Although I do take some offense to your statement "this is what people are using now …" srsly?)

ofek commented 7 years ago

@aclark4life Sorry about that, I meant no offense! It was regarding BigQuery usage, not download stats in general.

aclark4life commented 7 years ago

@ofek No prob! Just finished installing and testing pypinfo, very nice …