emorynlp / nlprankings

Ranking of Top Institutes for Natural Language Processing (NLP)
Other
24 stars 7 forks source link

Deduplication needed #3

Open hickst opened 4 years ago

hickst commented 4 years ago

Interesting project, but you'all need some way to merge author entities. I'm at the U of Arizona and see myself listed twice: once as 'Tom Hicks' and once as 'Thomas Hicks'. My co-author 'Marco A. Valenzuela-Escárcega' has this issue also, and that's just at our institution, so I suspect there are several others.

jdchoi77 commented 4 years ago

@hickst thanks for pointing this out. We are hoping for the community to help us fixing these issues since it is hard to detect them automatically. We will look into this and let you know soon.

hickst commented 4 years ago

@jdchoi77 Is there a way we can fix this by ourselves or do you have to make the changes? If the latter, is there a procedure to ask for a change (e.g. file a GitHub issue, fill out a form, make a pull request)?

chloelee1230 commented 4 years ago

@hickst Thank you for reflecting this to us. After some digging, I believe this issue is the result of having two separate author IDs on ACL Anthology. For instance, you have two separate author pages for Tom Hicks and Thomas Hicks. For your co-author Marco A. Valenzuela-Escárcega, his pages are Marco A. Valenzuela-Escárcega and Marco Valenzuela-Escárcega.

To fix this, please make a pull request and update the fields authors and author_id in the respective JSON file located under the directory dat/acl_anthology/json/.

Your author_id is essentially the last part of the url of your author's page. tom-hicks for Tom Hicks and thomas-hicks for Thomas Hicks. Your publication under the name Thomas Hicks is P15-4022. If you would like to go with the name Tom Hicks instead of Thomas Hicks, go to the respective venue JSON file, which is P15-4.json and update your name to Hicks, Thomas and author_id to tom-hicks for the publication P15-4022.

Your co-author 'Marco A. Valenzuela-Escárcega' can follow this approach to change his duplication problem as well!

Thanks again for your feedback.