allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

ExternalIDs formatting broken in JSON response #118

Closed erkannt closed 7 months ago

erkannt commented 1 year ago

Describe the bug Some of the recommended papers contain badly formatted externalIds, for example:

{
  "DOI": "10.31857/S0026898423030163, EDN: CHWIKZ",
  "CorpusId": 259173589,
  "PubMed": "37326060"
}

To Reproduce

curl 'https://api.semanticscholar.org/recommendations/v1/papers/forpaper/DOI:10.1101/2023.03.10.532141?fields=externalIds,authors,title'

Expected behavior The doi string to be closed just before the comma.

Additional context Seems to be a new defect introduced on the 17th of June 2023.

cfiorelli commented 9 months ago

Noting I'm seeing this behavior still. curl -s 'https://api.semanticscholar.org/recommendations/v1/papers/forpaper/DOI:10.1101/2023.03.10.532141?fields=externalIds&limit=5' | jq -r '.recommendedPapers[] | select(.externalIds.DOI) | .externalIds.DOI'

cfiorelli commented 8 months ago

TO triage on backend External IDs are mixed due to legacy architecture. We are evaluating updating how we do this and impact to developers.

cfiorelli commented 7 months ago

@erkannt Thanks for reporting this. We found that the data in the incorrectly seeming formatted DOI is likely coming from upstream providers and not a part of our systems controls. Hopefully this helps at least manage & understand for user side.