blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org
https://rxivist.org
GNU Affero General Public License v3.0
60 stars 11 forks source link

Add mechanism for detecting suspicious download spikes #238

Open rabdill opened 5 years ago

rabdill commented 5 years ago

The problem is this one: https://rxivist.org/papers/8472 Which had 33,000+ downloads added by a bot. A sample size of 1 is a disaster for detecting these things going forward, but can we develop some kind of rule that will flag suspicious patterns? Could the pattern simply be "An unreasonable increase in the download count of a single month, compared to the months on either side"? Is there a tight enough correlation between tweets and downloads that we could use that?