aclark4life / vanity

Get package download statistics from PyPI
GNU General Public License v2.0
66 stars 14 forks source link

Removing "bot download" from counts #19

Closed kootenpv closed 8 years ago

kootenpv commented 8 years ago

What does this do about the fact that PyPi is being crawled very often? It would be great if we could get a better estimate of "actual" counts, or did you do something to account for the number of bots out there?

aclark4life commented 8 years ago

Nothing and nope; what do you suggest we do to account for "bot download"?

kootenpv commented 8 years ago

Perhaps what is possible is to script uploading a nonsense unguessable package name to pip; every day a new update.

I guess from that we can estimate the number of bots active, say, given a week.

After some weeks/months of data, you can try to model a prediction backwards, and could correct the numbers.

Perhaps someone can think of a better approach.

aclark4life commented 8 years ago

Let's assume the fix for this is for vanity to query the dataset mentioned here, instead of relying on PyPI: https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html, now tracking that in #22 .