cambialens / lens-api-doc

10 stars 5 forks source link

query/pagination problem #29

Closed tkalant closed 3 years ago

tkalant commented 3 years ago

I am trying to pull the results on a search that yields ~23k on the browser, perform cursor based pagination and generate a pandas dataframe. For some reason, my query keeps getting the bad request error. I was wondering if you could help me with that:


url = 'https://api.lens.org/patent/search'
token = my-token
data = '''{
"query":{
    "bool":{
    "should":[
    {"match": {"inventor.name": "roche"}}, {"match":{"owner_all.name": "roche"}}
    ]
    }
    }, "size":1000, "scroll":"1m"
}'''

headers = {'Authorization': token, 'Content-Type': 'application/json'}
def scroll(scroll_id):
    if scroll_id is not None:
        global data
        data = '''{"scroll_id": "%s"}''' % scroll_id
    response = requests.post(url, data=data, headers=headers)
    if response.status_code !=requests.codes.ok:
        print(response.status_code)
    elif response.status_code == requests.codes.too_many_requests:
        time.sleep(8)
        scroll(scroll_id)
    else: 
        response_json = response.json()
        print(response_json["scroll_id"])
        scroll_id = response_json["scroll_id"]
        df = pd.read_json(response.text)
        print(df.shape)
        scroll(scroll_id)
scroll(scroll_id=None)`
rosharma9 commented 3 years ago

I believe you should be getting 404 response with body "Parameter 'size' shouldn't be greater than 100." The current Trial API allows 100 records for request and you are using 1000.