allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
190 stars 29 forks source link

ExternalIDs formatting broken in JSON response #118

Closed erkannt closed 1 year ago

erkannt commented 1 year ago

Describe the bug Some of the recommended papers contain badly formatted externalIds, for example:

{
  "DOI": "10.31857/S0026898423030163, EDN: CHWIKZ",
  "CorpusId": 259173589,
  "PubMed": "37326060"
}

To Reproduce

curl 'https://api.semanticscholar.org/recommendations/v1/papers/forpaper/DOI:10.1101/2023.03.10.532141?fields=externalIds,authors,title'

Expected behavior The doi string to be closed just before the comma.

Additional context Seems to be a new defect introduced on the 17th of June 2023.

cfiorelli commented 1 year ago

Noting I'm seeing this behavior still. curl -s 'https://api.semanticscholar.org/recommendations/v1/papers/forpaper/DOI:10.1101/2023.03.10.532141?fields=externalIds&limit=5' | jq -r '.recommendedPapers[] | select(.externalIds.DOI) | .externalIds.DOI'

cfiorelli commented 1 year ago

TO triage on backend External IDs are mixed due to legacy architecture. We are evaluating updating how we do this and impact to developers.

cfiorelli commented 1 year ago

@erkannt Thanks for reporting this. We found that the data in the incorrectly seeming formatted DOI is likely coming from upstream providers and not a part of our systems controls. Hopefully this helps at least manage & understand for user side.