blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org
https://rxivist.org
GNU Affero General Public License v3.0
59 stars 11 forks source link

Preprint missing from search results #244

Closed agitter closed 5 years ago

agitter commented 5 years ago

After the award-winning GLBIO talk from @rabdill, I checked my https://rxivist.org author profile and found one preprint is missing. It doesn't appear in the author profile, nor does it appear when searching for the title.

The direct DOI link https://rxivist.org/papers/10.1101/337956 works but is missing authors and download stats.

rabdill commented 5 years ago

Should be fixed now, sorry about that.

It looks like the issue pops up when a preprint gets associated with an invalid URL—in this case (and 41 others), the bioRxiv listings for some reason included a link to a new version of the paper that didn't actually exist, so the spider was trying to grab the authors for "v2" of the paper when there was only ever a "v1."

This has to be related to the transition to the full-text stuff; it looks like some of the papers legitimately don't have authors listed anymore: https://www.biorxiv.org/content/10.1101/418806v1

New code in https://github.com/blekhmanlab/rxivist/pull/245 uses DOI to resolve more accurate URLs

agitter commented 5 years ago

That's interesting. This paper did have a "v2" but now https://www.biorxiv.org/content/10.1101/337956v2 shows This paper is still processing; please check again shortly.

agitter commented 5 years ago

This issue is resolved, and bioRxiv also fixed the v2 version of the preprint.