blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org
https://rxivist.org
GNU Affero General Public License v3.0
60 stars 11 forks source link

Add table for additional names of authors #181

Open rabdill opened 6 years ago

rabdill commented 6 years ago

If we find an author based on their ORCID but they used a different name for a paper, we accurately link the author to that paper but then forget about that name immediately. This opens up a chance to miss a linkage in the future, albeit an edge case that involves pretty bad data input from the author:

  1. Author posts paper 1 under Name A with their ORCID.
  2. Author posts paper 2 under Name B with their ORCID. We accurately link paper 2 to the author, but forget about Name B and leave Name A untouched.
  3. Author posts another paper without their ORCID, using Name A. We accurately link this paper to the author.
  4. Author posts another paper without their ORCID, using Name B. We will not accurately link this paper to the author, and will instead create a second author under Name B without an ORCID.

If we create a new table for author "aliases," we can keep the existing entries the same, but if an author pops up with the same ORCID and a different name, we can record that name so we can use it in the future to link them to papers where they forgot to include the ORCID.