PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
28 stars 7 forks source link

Consider flexibility in rules for matching an article by title #1297

Open jvwong opened 1 month ago

jvwong commented 1 month ago

Background

The article matching has been iterated on many times for different edge cases: #1074 #1124 #848 and there are services aimed at resolving this information e.g. #1295. From my observations, this works pretty well, but there are cases where no article is matched, due to ambiguity.

Currently, an author's input title must be an exact subset of the record retrieved from either PubMed or CrossRef after 'sanitization':

Problems observed

There remain cases where we might want to reasonably relax conditions. For example:

Details

There are potential pitfalls to increasing flexibility, notably, the title of a manuscript can change between preprints, versions and the final version of record.

Tasks

jvwong commented 4 days ago

Not too flexible: https://github.com/PathwayCommons/factoid/issues/1299#issuecomment-2416867445