Open ahlesen opened 4 years ago
Hey @ahlesen, thanks for catching this! I looked into it and seems like a somewhat unusual case:
37387939
has the wrong ACL ID (off by one). The correct ACL ID is D17-1280
. Something must've happened during our data crawl.
28752386
has the correct ACL ID.
10822819
is a tricky case & I need to think about how to handle it. It looks like our crawler found a different version of the 28752386
paper from a department website, so the clustering decided to treat them as separate papers.
Anyways, can I get a sense of how serious this issue is for you? Given the scope of corpus, there will always be errors such as this, so trying to get a sense of how much this is impacting your use case?
Good day, Some different papers have the same acl_id.