OmnesRes / prepub

Production code for PrePubMed
http://www.prepubmed.org/
MIT License
17 stars 6 forks source link

bioRxiv indexing #12

Closed OmnesRes closed 5 years ago

OmnesRes commented 6 years ago

My indexing code stops if it sees a duplicated article, which was meant to prevent articles getting indexed more than once.

Essentially I was trying to predict the unfortunate situation of bioRxiv adding new articles while I'm indexing.

It appears that lately that situation has been happening a lot, which is probably because of the increase in bioRxiv submissions. This, combined with the fact that bioRxiv sometimes does actually post a duplicated article which will bring my indexing to a halt, I changed the code to recognize and remove any duplicated articles.

This seems to have been working great, except this article which didn't get indexed: "Stochastic character mapping of state-dependent diversification reveals the tempo of evolutionary decline in self-compatible Onagraceae lineages" https://www.biorxiv.org/content/early/2018/06/22/210484

I don't know what happened here.