NASA-PDS / registry-mgr

Standalone Registry Manager application responsible for managing the PDS Registry (https://github.com/NASA-PDS/registry) schemas and indexes.
https://nasa-pds.github.io/registry
Other
0 stars 2 forks source link

issue 53: increase the window limit #54

Closed al-niessner closed 1 year ago

al-niessner commented 1 year ago

🗒️ Summary

Add it to the index like the documentation says.

⚙️ Test Data and/or Report

None

♻️ Related Issues

fixes #53

al-niessner commented 1 year ago

@jordanpadams @jimmie

You were correct. It just has to be added when the index is created. Made change.

jimmie commented 1 year ago

I noticed that the documentation for index.max_result_window and the original error message refer to the scroll api and the scroll api refers to the search_after parameter (discussed very briefly here). Maybe we should give this a look since it appears to have less of a worst-case performance impact?

jordanpadams commented 1 year ago

@jimmie good call. let's maybe take a look at this first.

@al-niessner ☝️

al-niessner commented 1 year ago

I noticed that the documentation for index.max_result_window and the original error message refer to the scroll api and the scroll api refers to the search_after parameter (discussed very briefly here). Maybe we should give this a look since it appears to have less of a worst-case performance impact?

@jordanpadams @jimmie @tloubrieu-jpl

The search window requires state as stated in #53 which means abandoning RESTful API because the API would no longer be stateless. While it is an approach, it would mean an overhaul of the API, maybe for the better but most likely not, to use state to paginate through a million entries.

Let me try once more to point out that opensearch is not the technology you want if you desire a million records. The idea of opensearch is to search not query. In a query, you request all matching records then post process those records. In a search, you enter terms an look at the top N (usually smaller than 10 but never more than 50 because who ever goes past page 2 on google anymore) records. Adjust the search criteria if not in the first 10 and do it again until what one is looking for is in the first 10 select it and post process that single record -- maybe 2 or 3 if the first is not what you wanted which means you probably go back to adjusting search criteria again. If you look at opensearch and analytics to give a relevance score and limits on return sizes etc you can quickly see that it is not an SQL query that returns a million results. In other words, search is find a needle in a haystack quickly while query is for bulk processing.

So, if this need to return or page through a million records is real, then you should probably rethink opensearch or search in general. If the need is to show the top 10 relevant records out if million, then we need to rethink our search and return records.

jordanpadams commented 1 year ago

@al-niessner going to merge this one.

in the future, if possible when we have a PR that will fix a ticket, if we can use the github "keywords" that will automatically close the tickets when it is merged. I think there are a bunch but fixes, resolves, and closes are few

al-niessner commented 1 year ago

Ad stated previously in this thread, such a change requires us to abandon REST and maintain the scroll state.

On Wed, Aug 17, 2022, 15:19 Jimmie Young @.***> wrote:

I noticed that the documentation for index.max_result_window and the original error message refer to the scroll api and the scroll api refers to the search_after parameter (discussed very briefly here https://opensearch.org/docs/1.0/opensearch/ux/). Maybe we should give this a look since it appears to have less of a worst-case performance impact?

— Reply to this email directly, view it on GitHub https://github.com/NASA-PDS/registry-mgr/pull/54#issuecomment-1218551110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIUBIQQ4H3MUWGU7SLOQL3VZVQOXANCNFSM5624DNKQ . You are receiving this because you were assigned.Message ID: @.***>