epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
291 stars 100 forks source link

Add sorting and pagination to bingo-elastic #1826

Open khyurri opened 3 months ago

khyurri commented 3 months ago

Background Currently, users can only access a limited count of compounds (up to 10,000 by default) from ElasticSearch. We need to incorporate sorting and pagination functionality into the bingo-elastic driver. This will allow users to receive more than 10,000 compounds from the index.

Solution ElasticSearch offers a pagination feature (PIT). Visit https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html to learn more. We need to incorporate both the Java and Python functionality to enable transparent use of PIT, thus eliminating the need for user-level management. Therefore, if a user sets a sorting parameter, the bingo-elastic driver will automatically continue fetching documents until the limit is reached. Importantly, after implementing this change, the limit option should be optional. This will allow users to download the entire index if they choose to do so.

khyurri commented 3 months ago

In progress PR for Python: https://github.com/epam/Indigo/pull/1712

uladkaminski commented 3 months ago

Java changes PR: #1827

uladkaminski commented 3 months ago

Additional Java changes PR: #1880