Closed timwoelfle closed 4 months ago
Here's reproducible code in python:
import requests
api_url = 'https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,venue,year,externalIds,abstract,referenceCount,citationCount,publicationTypes,publicationDate,journal,authors.externalIds,authors.name,authors.affiliations,references.paperId,tldr,citations.paperId'
payload = {"ids": ["5c6a907a418896b8aee17663e8c87895c1622fd3", "f7014c1b0b2e820ba82a017924590f3098b49910", "0e56e9006d1a992de243e129025a000f3bc791a4", "434ca529b68aabfb4835ac2cb8a8a3da6f83efe1", "2dba24d0ae646a9562d1bdef3b2605325e65dc0f", "14a62330576422c5e984be619299206110bacefb", "ac2f7ce4fd521c11d3654b85839b96ef41a0f287", "88b80d9466a4fb941c2b5b463dba1e2a4f23ebf4", "505e022f19daaf96a59040e72c7194599c219af7", "296bc78c86d17481e9b8983632773f3c5666b2af", "dcb70f058a5db720462641b5090235b66cbb18ae", "c2dae083b5d082978b1994dc79c19d32a0b3274a", "a4b6e12005d58e512712405b351ae128b5f9300f", "0b999eb051fdefda6ec5efae076dad7d138287c8", "d3d718f3f0e4e6d91b3b13524b3e90496e76f841", "67ad40cb40d7784d5543bf6166b55ef2dfb37ea1", "c9b832926aef3e37c81fc5f1ced7e853a6cae6a1", "12334afe89c06c07a1409d4442a1d51c26e10d93", "d4a22bb96196c2e2df704a162522c53678091bb3", "c706f5b8184a145e4f9d6ffbd62f6757c3badc3e", "5cdc695ab97a720e468d28868528c785fbd8a114", "9e2b5146d43268cde0a223c4ebafead8b63d7528", "ed8713ca0d4e263cbb12c0da16fe56d6abc732fd", "6c69a425959a4e98df944c10300258f18119c3b7", "e2e16f3c123850dffbb38765ef8fd71ebaecdaca", "e43256238dbfcf1fe37aac918a6d2d033e22d380", "922cd02a5e4f1298384cb5b9f6d13df5daf64b70", "467f0fdc420f5cd8996c0b2b1eb33a3dcda93c5e", "5595d6e87417ba69831cd6da96e063b0a7ea373b", "234590c3c737fe38ab3632f4a86a195462c547a7", "963c95a977e4ce253791a7683ee19d91514a2002", "04f4c68fa7bc5c9ea550076bb911b68b052d28a7", "9758a5cb826ed7199ee8822f08108fd6bbf7a106", "a010b76e1a809da5a128def075a310b1b1511593", "045555ec4342da07074949f540bc615cb8c453cf", "0de477d496b226525e56d2e6591a7721697dc2a8", "c5e8862bfd224b8078a77655602c910963df75d3", "be3eda717b99731f93de80d75031f38e40f84cee", "4e16328e599e9d3169f40b6dbfbd039b4ca673a2", "4d41108590a7823ea9b943bd4c614534edba3b8d", "df6ade47d3bbab757e8fcf6b3f026b7d3d44ed01", "37a09c5884e85fb6481e8bbd06724fa5ab293a39", "389ec712b590cba24a184aa9704bcfad0970f1b0", "706a40b0d9e7c046fa206124b78f25117f3e86af", "70fae121f412c19612115eff06c13134b8cb2060", "2b0dd59254ed9d1255f817e427ede2c9f53e5e5f", "7411530aa26843b62f9174fa9d004bab72e476dc", "8766710c66d2a93541f61003d2d2562573636f2d", "fedd542a6c24f5dc2fc3b5cf8391326a605ccf85", "df45f6e2d2e3a4abac857b914cae703f225957a0", "5c2ae3bf77fbca2feb457e60861232af41b44403", "53564c45fe0af4889c92f05b04626f7ac739a97a", "d5237678e6d12e95bf989f7972fc065cc3800d55", "ccb5d68fc4aef32b84fcaf409b0b672c46a2bd51", "585bf445ec84c1d9621b2726bdcce9f544b515c8"]}
# Make the API request
response = requests.post(api_url, json=payload)
# Check the response
if response.status_code == 200:
data = response.json()
# Ensure correct parsing of the API response
if len(data):
print("len(citations):", [len(item.get('citations', [])) for item in data])
print("citationCount:", [item.get('citationCount', 0) for item in data])
else:
print("No data found in the response.")
else:
print(f"Error: {response.status_code} - {response.text}")
Ouput:
len(citations): [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8675, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1324, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
citationCount: [148, 376, 86, 19, 473, 142, 56, 73, 148, 280, 289, 61, 361, 144, 24, 62, 101, 185, 163, 207, 179, 31, 3411, 28, 508, 867, 609, 8675, 411, 49, 405, 272, 699, 982, 69, 80, 3599, 190, 671, 821, 1214, 391, 3121, 1628, 247, 1205, 9033, 5, 3935, 3488, 521, 332, 2645, 664, 2308]
I would expect these numbers should match, right?
Thank you @timwoelfle for the detailed report. I've tested and escalated this to the appropriate team.
@timwoelfle I've found that we have an undocumented limit of 9999 citations results per request. In your example the system returned all of the results for paper corresponding to 8675 citations and 1324 results for the other corresponding paper. Any missing citations here were not possible to be returned. This limitation currently cannot be increased. Documentation is being updated.
Thanks so much for catching this and bringing it to our attention
Dear Semantic Scholar team
I'm an academic using Semantic Scholar personally and for my free and open source tool "Local Citation Network" (https://localcitationnetwork.github.io/). I've noticed some weird behaviour with the API batch endpoint which I believe may be a bug.
Now here's the issue: Nearly all articles in the response have an empty array in the "citations" field, even though their "citationCount" numbers are positive:
Only 2 fields (indices 27 and 42) actually had citations, interestingly quite many: 8667 and 1332, respectively (way more than the "up to 1000 will be returned" mentioned in the documentation on https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/post_graph_get_papers).
I believe this is a bug. I've attached two response jsons (I tried it twice and it was reproducible). The request details of the API call are below.
I hope you can reproduce, track down, and fix this bug. Let me know if you have any questions! Thanks and best,
Tim Woelfle
Request header:
POST https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,venue,year,externalIds,abstract,referenceCount,citationCount,publicationTypes,publicationDate,journal,authors.externalIds,authors.name,authors.affiliations,references.paperId,tldr,citations.paperId
Request body (list of 55 ids):
{"ids":["5c6a907a418896b8aee17663e8c87895c1622fd3","f7014c1b0b2e820ba82a017924590f3098b49910","0e56e9006d1a992de243e129025a000f3bc791a4","434ca529b68aabfb4835ac2cb8a8a3da6f83efe1","2dba24d0ae646a9562d1bdef3b2605325e65dc0f","14a62330576422c5e984be619299206110bacefb","ac2f7ce4fd521c11d3654b85839b96ef41a0f287","88b80d9466a4fb941c2b5b463dba1e2a4f23ebf4","505e022f19daaf96a59040e72c7194599c219af7","296bc78c86d17481e9b8983632773f3c5666b2af","dcb70f058a5db720462641b5090235b66cbb18ae","c2dae083b5d082978b1994dc79c19d32a0b3274a","a4b6e12005d58e512712405b351ae128b5f9300f","0b999eb051fdefda6ec5efae076dad7d138287c8","d3d718f3f0e4e6d91b3b13524b3e90496e76f841","67ad40cb40d7784d5543bf6166b55ef2dfb37ea1","c9b832926aef3e37c81fc5f1ced7e853a6cae6a1","12334afe89c06c07a1409d4442a1d51c26e10d93","d4a22bb96196c2e2df704a162522c53678091bb3","c706f5b8184a145e4f9d6ffbd62f6757c3badc3e","5cdc695ab97a720e468d28868528c785fbd8a114","9e2b5146d43268cde0a223c4ebafead8b63d7528","ed8713ca0d4e263cbb12c0da16fe56d6abc732fd","6c69a425959a4e98df944c10300258f18119c3b7","e2e16f3c123850dffbb38765ef8fd71ebaecdaca","e43256238dbfcf1fe37aac918a6d2d033e22d380","922cd02a5e4f1298384cb5b9f6d13df5daf64b70","467f0fdc420f5cd8996c0b2b1eb33a3dcda93c5e","5595d6e87417ba69831cd6da96e063b0a7ea373b","234590c3c737fe38ab3632f4a86a195462c547a7","963c95a977e4ce253791a7683ee19d91514a2002","04f4c68fa7bc5c9ea550076bb911b68b052d28a7","9758a5cb826ed7199ee8822f08108fd6bbf7a106","a010b76e1a809da5a128def075a310b1b1511593","045555ec4342da07074949f540bc615cb8c453cf","0de477d496b226525e56d2e6591a7721697dc2a8","c5e8862bfd224b8078a77655602c910963df75d3","be3eda717b99731f93de80d75031f38e40f84cee","4e16328e599e9d3169f40b6dbfbd039b4ca673a2","4d41108590a7823ea9b943bd4c614534edba3b8d","df6ade47d3bbab757e8fcf6b3f026b7d3d44ed01","37a09c5884e85fb6481e8bbd06724fa5ab293a39","389ec712b590cba24a184aa9704bcfad0970f1b0","706a40b0d9e7c046fa206124b78f25117f3e86af","70fae121f412c19612115eff06c13134b8cb2060","2b0dd59254ed9d1255f817e427ede2c9f53e5e5f","7411530aa26843b62f9174fa9d004bab72e476dc","8766710c66d2a93541f61003d2d2562573636f2d","fedd542a6c24f5dc2fc3b5cf8391326a605ccf85","df45f6e2d2e3a4abac857b914cae703f225957a0","5c2ae3bf77fbca2feb457e60861232af41b44403","53564c45fe0af4889c92f05b04626f7ac739a97a","d5237678e6d12e95bf989f7972fc065cc3800d55","ccb5d68fc4aef32b84fcaf409b0b672c46a2bd51","585bf445ec84c1d9621b2726bdcce9f544b515c8"]}
API calls were performed via fetch on Firefox 126 24-06-01-S2-response-Boulton-2021-rerun.json on Ubuntu 24-06-01-S2-response-Boulton-2021.json