PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
28 stars 7 forks source link

How to update article metadata #1201

Closed jvwong closed 11 months ago

jvwong commented 1 year ago

The purpose of the article update module is to add and/or refresh metadata for an existing paper. This must be done carefully, taking into account:

Would an overwrite delete all the previous ‘pubmed’ data, or would it be a merge? Do you see a use case for merging? Deletion would be simpler.

Another alternative is to store the original, raw data from each source so you could always recover or merge in future, i.e.

  • pubmed stores the processed data the app expects
  • rawPubmed stores the original, raw PubMed data
  • rawCrossRef stores the original, raw CrossRef data

I think straightforward deletion is the simplest approach, but let me know if you see a use case for the other approaches.

This (CRON updates/disambiguation of papers) deserves a separate issue, apart from CrossRef (todo).

One latent bug that has emerged is: PubMed will now index preprints from bioRxiv/medRxiv authored by those NIH-funded, so, e.g. someone could add a bioRxiv paper picked up by PubMed and if it was published somewhere with the same title (i.e. author input) it would get overwritten.
You can always put in the exact PMID or DOI but I suspect 90% of people have not idea what those are.

Originally posted by @jvwong in https://github.com/PathwayCommons/factoid/issues/961#issuecomment-1708773205

jvwong commented 1 year ago

Below are the possible cases for an article that needs to be updated (i.e. not initiated, not empty and not stale).

Table 1. Article update cases (in order of preference) paperId (provided) PMID DOI Status Action
Y Y Y/N In PubMed GET - via PMID
Y N Y Only in CrossRef GET - via DOI
Y N N Not found SEARCH - query with paperId (text)