allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

Filter out titles in author names #46

Open pwrose opened 9 months ago

pwrose commented 9 months ago

Example

The author list includes titles: MD Parag N. Jain, MD Sebastian Acosta, PhD Ananth Annapragada, M. P. A. Paul A. Checchia, MD Axel Moreira, MD MS Eyal Muscal, MD Sarah E. Sartain, S. Kristen, MD Sexson Tejtel, MD Tiphanie P. Vogel, MD Lara Shekerdemian, PhD Craig G. Rusin, Online Clinical Investigation

API call

Returns: {"paperId": "b821f1883605cf85fb6ef38d6c18641f744d01b7", "authors": [{"authorId": "2233154289", "name": "MD Parag N. Jain"}, {"authorId": "2233152973", "name": "MD Sebastian Acosta"}, {"authorId": "2233152981", "name": "PhD Ananth Annapragada"}, {"authorId": "2231840661", "name": "M. P. A. Paul A. Checchia"}, {"authorId": "2233155287", "name": "MD Axel Moreira"}, {"authorId": "2233154284", "name": "MD MS Eyal Muscal"}, {"authorId": "2233152983", "name": "MD Sarah E. Sartain"}, {"authorId": "1412100811", "name": "S. Kristen"}, {"authorId": "2233152527", "name": "MD Sexson Tejtel"}, {"authorId": "2233152150", "name": "MD Tiphanie P. Vogel"}, {"authorId": "2233154251", "name": "MD Lara Shekerdemian"}, {"authorId": "2226316557", "name": "PhD Craig G. Rusin"}, {"authorId": "2233155909", "name": "Online Clinical Investigation"}]}

The title is part of the name and a new author id is created, instead of mapping it to an existing author.