allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

same acl_id for different papers #13

Open ahlesen opened 4 years ago

ahlesen commented 4 years ago

Good day, Some different papers have the same acl_id. image

image

kyleclo commented 4 years ago

Hey @ahlesen, thanks for catching this! I looked into it and seems like a somewhat unusual case:

Anyways, can I get a sense of how serious this issue is for you? Given the scope of corpus, there will always be errors such as this, so trying to get a sense of how much this is impacting your use case?