allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

Bug: Papers without Pubmed ID #183

Closed PedroSena closed 2 months ago

PedroSena commented 4 months ago

Describe the Bug I found a subset of Semantic Scholar papers where I expected to find a Pubmed Id but found none.

To Reproduce Here is the list of paperIds and the expected Pubmed IDs:

6fe15f736b92c295fdf4934e295990b916cbc0c7 -> 38221462
0b2fcbbc34c5e16d3147f9fb2754a947154add90 -> 37250580
9f1d6fd983892ce9adfd286f3954a17b631e17cb -> 37605768
df9fbf64106b367d329271110c67a1dbabdaff12 -> 36157372
3f856b3e060fd02c37ccbb2ad52d1d01fa507a95 -> 38292851
d9cafb2b20a99724adb0ff8548b570d9e6558f3f -> 38348021

For the first case for instance when we run:

curl -v -X POST https://api.semanticscholar.org/graph/v1/paper/batch\?fields\=externalIds -d '{"ids": ["PMID:38221462"]}'

we get:

[{"paperId": "6fe15f736b92c295fdf4934e295990b916cbc0c7", "externalIds": {"DOI": "10.11646/zootaxa.5351.5.8", "CorpusId": 263278886}}]

This does not happen for every instance mentioned above though

I also have the following paperIds that belong to Errata, however, there is a Pubmed counterpart so I'd expect them to have such data as well:

dfdecce605c3245fc11bd8f96c639ca61175d4a0 -> 38348281
7de1e49301cd43e3a7580a9503d002daf5b51ac7 -> 38347973
519c7ca3cdc9c88bb619ff87bafcc5bfba4022c1 -> 38348226

Expected Behavior I expected the paperIds listed above to have Pubmed IDs associated to them.

Actual Behavior They return corpusId, sometimes a DOI but no Pubmed ID.

Thanks

cfiorelli commented 2 months ago

logging this for triage by data / pipeline team thank you for the report @PedroSena