allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

Bug: `/references` giving duplicate papers (preprint and non-preprint) #193

Closed jamesbraza closed 6 days ago

jamesbraza commented 2 months ago

Describe the Bug

On 4/12/2024, from /graph/v1/paper/1db1bde653658ec9b30858ae14650b8f9c9d438b/references, the following DOIs show up twice:

In this case, what's happening is the preprint vs non-preprint have unique Semantic Scholar IDs, but the same DOI. My issue is the endpoint is returning both preprint and non-preprint papers.

Expected Behavior

Either:

Imo this is not a one-off data correction issue, I think this is a logic problem that will apply to this endpoint in general

cfiorelli commented 1 month ago

@jamesbraza Thanks for sharing this - forwarding to our data team in case, as you said, this is a larger scale issue. Typically we find this self resolve at logic and pipelines are under ongoing development.

today at this test url I only found -9c3e49e6fb981b8c88bc19f62df6e7276eb38399 -2c0e0440882a42be752268d0b64243243d752a74

Maybe this has already self resolved?

thanks so much for reporting this, and for your patience. we're working to improve turn around on inquiries across platforms

cfiorelli commented 6 days ago

Closing this as the issue appears to be resolved.