cambialens / lens-api-doc

10 stars 5 forks source link

Issue on large-list scholar search #69

Closed shang594 closed 2 months ago

shang594 commented 2 months ago

I use the Scholar Code Samples in API,"Python - Cursor Based Pagination - List of Identifiers".But I cant get answer. the code is "{'reference': '1152c8c9-1ba9-448e-896d-b4c4d445e122', 'message': '[terms] query does not support [lens_id]', 'code': 400}"

rosharma9 commented 2 months ago

@shang594 The example was for brevity. You need to make sure the ids in the "lens_id":'''+(json.dumps(ids))+''' is passed as list. e.g. You can use something like itertools.batched to split the list of ids into chunks of 5000. I have pushed the updated sample that should work for you.

shang594 commented 2 months ago

Thanks for your rapid answer. Yes,I have tried this,split ids into slices of 20. But 2k ids take 5m. I will try your method immediately.

shang594 commented 2 months ago

I tried your method,but it's still wrong. I passed my ids as list of 5k. This is my code and message. identifiers = main.read_csv_en('D:\\下载\\lens-export-readcsv.csv') #list with almost 50k ids ids = [identifiers[i:i+5000] for i in range(0,len(identifiers),5000)] for i in range(0,len(ids)): request_body = '''{ "query": { "terms": { "lens_id":''' + (json.dumps(ids[i])) + ''' } }, "include": %s }''' % include scroll(scroll_id=None,request_body=request_body)

Traceback (most recent call last): File "D:\project\ThesesAndPatents\test-list.py", line 52, in scroll(scroll_id,request_body=request_body) File "D:\project\ThesesAndPatents\test-list.py", line 31, in scroll scroll_id = json['scroll_id'] # Extract the new scroll id from response

shang594 commented 2 months ago

And what's the meaning of "scroll id" why I should use this, if I dont use it, sth wiil be wrong?