Open timwoelfle opened 1 month ago
Here's reproducible code in python:
import requests
api_url = 'https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,venue,year,externalIds,abstract,referenceCount,citationCount,publicationTypes,publicationDate,journal,authors.externalIds,authors.name,authors.affiliations,references.paperId,tldr,citations.paperId'
payload = {"ids": ["5c6a907a418896b8aee17663e8c87895c1622fd3", "f7014c1b0b2e820ba82a017924590f3098b49910", "0e56e9006d1a992de243e129025a000f3bc791a4", "434ca529b68aabfb4835ac2cb8a8a3da6f83efe1", "2dba24d0ae646a9562d1bdef3b2605325e65dc0f", "14a62330576422c5e984be619299206110bacefb", "ac2f7ce4fd521c11d3654b85839b96ef41a0f287", "88b80d9466a4fb941c2b5b463dba1e2a4f23ebf4", "505e022f19daaf96a59040e72c7194599c219af7", "296bc78c86d17481e9b8983632773f3c5666b2af", "dcb70f058a5db720462641b5090235b66cbb18ae", "c2dae083b5d082978b1994dc79c19d32a0b3274a", "a4b6e12005d58e512712405b351ae128b5f9300f", "0b999eb051fdefda6ec5efae076dad7d138287c8", "d3d718f3f0e4e6d91b3b13524b3e90496e76f841", "67ad40cb40d7784d5543bf6166b55ef2dfb37ea1", "c9b832926aef3e37c81fc5f1ced7e853a6cae6a1", "12334afe89c06c07a1409d4442a1d51c26e10d93", "d4a22bb96196c2e2df704a162522c53678091bb3", "c706f5b8184a145e4f9d6ffbd62f6757c3badc3e", "5cdc695ab97a720e468d28868528c785fbd8a114", "9e2b5146d43268cde0a223c4ebafead8b63d7528", "ed8713ca0d4e263cbb12c0da16fe56d6abc732fd", "6c69a425959a4e98df944c10300258f18119c3b7", "e2e16f3c123850dffbb38765ef8fd71ebaecdaca", "e43256238dbfcf1fe37aac918a6d2d033e22d380", "922cd02a5e4f1298384cb5b9f6d13df5daf64b70", "467f0fdc420f5cd8996c0b2b1eb33a3dcda93c5e", "5595d6e87417ba69831cd6da96e063b0a7ea373b", "234590c3c737fe38ab3632f4a86a195462c547a7", "963c95a977e4ce253791a7683ee19d91514a2002", "04f4c68fa7bc5c9ea550076bb911b68b052d28a7", "9758a5cb826ed7199ee8822f08108fd6bbf7a106", "a010b76e1a809da5a128def075a310b1b1511593", "045555ec4342da07074949f540bc615cb8c453cf", "0de477d496b226525e56d2e6591a7721697dc2a8", "c5e8862bfd224b8078a77655602c910963df75d3", "be3eda717b99731f93de80d75031f38e40f84cee", "4e16328e599e9d3169f40b6dbfbd039b4ca673a2", "4d41108590a7823ea9b943bd4c614534edba3b8d", "df6ade47d3bbab757e8fcf6b3f026b7d3d44ed01", "37a09c5884e85fb6481e8bbd06724fa5ab293a39", "389ec712b590cba24a184aa9704bcfad0970f1b0", "706a40b0d9e7c046fa206124b78f25117f3e86af", "70fae121f412c19612115eff06c13134b8cb2060", "2b0dd59254ed9d1255f817e427ede2c9f53e5e5f", "7411530aa26843b62f9174fa9d004bab72e476dc", "8766710c66d2a93541f61003d2d2562573636f2d", "fedd542a6c24f5dc2fc3b5cf8391326a605ccf85", "df45f6e2d2e3a4abac857b914cae703f225957a0", "5c2ae3bf77fbca2feb457e60861232af41b44403", "53564c45fe0af4889c92f05b04626f7ac739a97a", "d5237678e6d12e95bf989f7972fc065cc3800d55", "ccb5d68fc4aef32b84fcaf409b0b672c46a2bd51", "585bf445ec84c1d9621b2726bdcce9f544b515c8"]}
# Make the API request
response = requests.post(api_url, json=payload)
# Check the response
if response.status_code == 200:
data = response.json()
# Ensure correct parsing of the API response
if len(data):
print("len(citations):", [len(item.get('citations', [])) for item in data])
print("citationCount:", [item.get('citationCount', 0) for item in data])
else:
print("No data found in the response.")
else:
print(f"Error: {response.status_code} - {response.text}")
Ouput:
len(citations): [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8675, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1324, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
citationCount: [148, 376, 86, 19, 473, 142, 56, 73, 148, 280, 289, 61, 361, 144, 24, 62, 101, 185, 163, 207, 179, 31, 3411, 28, 508, 867, 609, 8675, 411, 49, 405, 272, 699, 982, 69, 80, 3599, 190, 671, 821, 1214, 391, 3121, 1628, 247, 1205, 9033, 5, 3935, 3488, 521, 332, 2645, 664, 2308]
I would expect these numbers should match, right?
Thank you @timwoelfle for the detailed report. I've tested and escalated this to the appropriate team.
Dear Semantic Scholar team
I'm an academic using Semantic Scholar personally and for my free and open source tool "Local Citation Network" (https://localcitationnetwork.github.io/). I've noticed some weird behaviour with the API batch endpoint which I believe may be a bug.
Now here's the issue: Nearly all articles in the response have an empty array in the "citations" field, even though their "citationCount" numbers are positive:
Only 2 fields (indices 27 and 42) actually had citations, interestingly quite many: 8667 and 1332, respectively (way more than the "up to 1000 will be returned" mentioned in the documentation on https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/post_graph_get_papers).
I believe this is a bug. I've attached two response jsons (I tried it twice and it was reproducible). The request details of the API call are below.
I hope you can reproduce, track down, and fix this bug. Let me know if you have any questions! Thanks and best,
Tim Woelfle
Request header:
POST https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,venue,year,externalIds,abstract,referenceCount,citationCount,publicationTypes,publicationDate,journal,authors.externalIds,authors.name,authors.affiliations,references.paperId,tldr,citations.paperId
Request body (list of 55 ids):
{"ids":["5c6a907a418896b8aee17663e8c87895c1622fd3","f7014c1b0b2e820ba82a017924590f3098b49910","0e56e9006d1a992de243e129025a000f3bc791a4","434ca529b68aabfb4835ac2cb8a8a3da6f83efe1","2dba24d0ae646a9562d1bdef3b2605325e65dc0f","14a62330576422c5e984be619299206110bacefb","ac2f7ce4fd521c11d3654b85839b96ef41a0f287","88b80d9466a4fb941c2b5b463dba1e2a4f23ebf4","505e022f19daaf96a59040e72c7194599c219af7","296bc78c86d17481e9b8983632773f3c5666b2af","dcb70f058a5db720462641b5090235b66cbb18ae","c2dae083b5d082978b1994dc79c19d32a0b3274a","a4b6e12005d58e512712405b351ae128b5f9300f","0b999eb051fdefda6ec5efae076dad7d138287c8","d3d718f3f0e4e6d91b3b13524b3e90496e76f841","67ad40cb40d7784d5543bf6166b55ef2dfb37ea1","c9b832926aef3e37c81fc5f1ced7e853a6cae6a1","12334afe89c06c07a1409d4442a1d51c26e10d93","d4a22bb96196c2e2df704a162522c53678091bb3","c706f5b8184a145e4f9d6ffbd62f6757c3badc3e","5cdc695ab97a720e468d28868528c785fbd8a114","9e2b5146d43268cde0a223c4ebafead8b63d7528","ed8713ca0d4e263cbb12c0da16fe56d6abc732fd","6c69a425959a4e98df944c10300258f18119c3b7","e2e16f3c123850dffbb38765ef8fd71ebaecdaca","e43256238dbfcf1fe37aac918a6d2d033e22d380","922cd02a5e4f1298384cb5b9f6d13df5daf64b70","467f0fdc420f5cd8996c0b2b1eb33a3dcda93c5e","5595d6e87417ba69831cd6da96e063b0a7ea373b","234590c3c737fe38ab3632f4a86a195462c547a7","963c95a977e4ce253791a7683ee19d91514a2002","04f4c68fa7bc5c9ea550076bb911b68b052d28a7","9758a5cb826ed7199ee8822f08108fd6bbf7a106","a010b76e1a809da5a128def075a310b1b1511593","045555ec4342da07074949f540bc615cb8c453cf","0de477d496b226525e56d2e6591a7721697dc2a8","c5e8862bfd224b8078a77655602c910963df75d3","be3eda717b99731f93de80d75031f38e40f84cee","4e16328e599e9d3169f40b6dbfbd039b4ca673a2","4d41108590a7823ea9b943bd4c614534edba3b8d","df6ade47d3bbab757e8fcf6b3f026b7d3d44ed01","37a09c5884e85fb6481e8bbd06724fa5ab293a39","389ec712b590cba24a184aa9704bcfad0970f1b0","706a40b0d9e7c046fa206124b78f25117f3e86af","70fae121f412c19612115eff06c13134b8cb2060","2b0dd59254ed9d1255f817e427ede2c9f53e5e5f","7411530aa26843b62f9174fa9d004bab72e476dc","8766710c66d2a93541f61003d2d2562573636f2d","fedd542a6c24f5dc2fc3b5cf8391326a605ccf85","df45f6e2d2e3a4abac857b914cae703f225957a0","5c2ae3bf77fbca2feb457e60861232af41b44403","53564c45fe0af4889c92f05b04626f7ac739a97a","d5237678e6d12e95bf989f7972fc065cc3800d55","ccb5d68fc4aef32b84fcaf409b0b672c46a2bd51","585bf445ec84c1d9621b2726bdcce9f544b515c8"]}
API calls were performed via fetch on Firefox 126 24-06-01-S2-response-Boulton-2021-rerun.json on Ubuntu 24-06-01-S2-response-Boulton-2021.json