PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
27 stars 7 forks source link

Improve process of identifying and updating article using the author-provided information (e.g. title) #1281

Closed jvwong closed 5 days ago

jvwong commented 6 days ago

This is part documentation, part update.

How Biofactoid matches articles to author-provided information

Authors start using Biofactoid by entering their paper title, but we also accept a PubMed identifier (PMID) and Digital Object Identifier (DOI). The goal is to identify the paper, which in practice, means retrieving a matching record from an index, in this cases, PubMed or CrossRef. For a PMID or DOI, the process is trivial. For a title, the process is more complex, but can be summarized in the following table:

*Match item in PubMed Match preprint in Crossref Interpretation
Publication
Preprint
**Preprint
Ambiguous

*Match: Author-provided information is (1) substring of retrieved article title or (2) equal to article DOI or PMID ** If DOIs are equal use PubMed otherwise use most recent. Example: bioRxiv preprint in PubMed forwarded to eLife as a reviewed preprint. We desire the latter.

When author articles cannot be matched

An author's paper may not be matched for trivial reasons (incorrect information provided, spam) but this is rare. One important case is an accepted manuscript that is yet to be published, which, depending on the journal, can be on the order of months - see #1280.

CRON: Trying again

Our CRON currently runs once a week, and is tasked with updating article information. The update works nearly identically to the process described above for finding and author's paper.

Minor updates

Refs #1211, #1201