cambialens / lens-api-doc

10 stars 6 forks source link

Pagination question #5

Closed zilch42 closed 5 years ago

zilch42 commented 5 years ago

Hi,

I’m wondering if you can help me with correct usage of the API. I have a successful request which is returning 49k results and I am trying to page through them but I can’t seem to get either the offset or scroll methods to work.

{
    "query": {
        "bool": {
            "must": [
                {"match": {"author.affiliation.country_code": "AU"}},
                {"match": {"has_patent_citations": true}},
                {"terms": {"publication_type": ["journal article","conference proceedings article"]}}
            ]
        }
    },
    "size": 1000,
    "include": [
        "patent_citations",
        "patent_citations_count",
        "lens_id","publication_type",
        "publication_supplementary_type",
        "title",
        "external_ids",
        "keywords" ]
}

The query above works and returns the first 1000 results.

If I try and get the next thousand, going from 1001, I get an error saying “The result window is too large for allowed maximum of 1000.”

                "from": 1001,
                "size": 1000,

Alternatively, I have tried the following to get a scroll, but I don’t seem to get a scroll_id back. I may not be using scroll correctly. How does this feature work?

                "size": 1000,
                "scroll": "1m",

Cheers

rosharma9 commented 5 years ago

Thank you for reaching out @zilch42 . Your query seems to work fine. You will need to use scroll context to paginate as from and size are just for small records. Please try the following query to paginate:

{
   "query": { ... },
   "size": 1000,
   "scroll": "1m",
   "include": [ ... ]
}

You will receive scroll_id in the response body. For the following requests, you just need to provide scroll context. As the scroll_id might change, use the latest one to access next page.

{
  "scroll_id": "...",
  "scroll": "1m"
}

You can scroll it until you get all results back from the server. Since we use Rate-limiting to protect our server against high load and keep on serving all the users equally, I would recommend to add delay between each request if you are accessing it programmatically.

Thank you

zilch42 commented 5 years ago

Thanks @rosharma9, I found the scroll_id. I'd been looking in the data table but it's in the top level.