IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

How to query/fetch massive results sets? #419

Closed leapoli closed 2 years ago

leapoli commented 2 years ago

Hi. I'm trying to query to a result list with a total amount over 100k terms. The case is if I use the combination of offset + limit; I cannot reach over 10k terms. For instance, if I want to get the 10000th (and over) term; I got rejected by the server because offset+limit > 10000 (the 10000th is unreachable because offset can be 10000, but limit cannot be 0). The response is:

BAD_REQUEST Maximum unsorted offset + page size is 10,000.
kaicode commented 2 years ago

Hi @leapoli,

Please use the searchAfter pagination feature for this. The 10K offline limit comes from Elasticsearch. We can work around the limit by using searchAfter, which tells Elasticsearch to fetch the results after the last result on the previous page.

For example if I want to fetch a very large results set (this example has over 117K results) I would start by fetching the first 1K results: https://snowstorm.ihtsdotools.org/snowstorm/snomed-ct/MAIN/2022-05-31/concepts?ecl=%3C%20404684003&limit=1000

Then I would take the searchAfter value from the bottom of the first page and use it to load the next page, instead of using offset, like this: https://snowstorm.ihtsdotools.org/snowstorm/snomed-ct/MAIN/2022-05-31/concepts?ecl=%3C%20404684003&limit=1000&searchAfter=WzE2MDA4NzUxMDAwMTE5MTA4XQ==

Then I would keep iterating; taking the new searchAfter value from the bottom of each page and use that to load the next page. I would continue doing this until I reach a page with less than 1K items, or a blank page.

This technique allows results sets to be retrieved without limits.

I hope that helps!

Kai Kewley


The public Snowstorm instance is for reference purposes and has rate limiting.

SNOMED International Snowstorm instance includes SNOMED Clinical Terms® (SNOMED CT®) which is used by permission of the SNOMED International. All rights reserved. SNOMED CT® was originally created by the College of American Pathologists. “SNOMED”, “SNOMED CT” and “SNOMED Clinical Terms” are registered trademarks of the SNOMED International (www.snomed.org)

leapoli commented 2 years ago

@kaicode

It definitely works! It's odd, but it does its jobs.

I think this should be officially documented somewhere, rather than just as an issue.

Thanks a lot!!