CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

Sorting in OpenSearch #627

Open sfisher opened 1 month ago

sfisher commented 1 month ago

I implemented an initial version of the sort feature that used to exist in the database.

I get the warning:

RequestError(400, 'search_phase_execution_exception', 'Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [resource.title] in order to load field data by uninverting the inverted index. Note that this can use significant memory.')

Probably the most efficient way to solve this is to also store a "sort" version of the fields we want to sort on with a limited length. This field would be of keyword type rather than text (since the text has the problem of high processing usage).

My initial proposal is that we only support sorting on the 6 or so columns that the search actually displays for sorting column headings. (Right now the backend code for the DB allows sorting on a huge number of fields like 15-20, which I think is overkill.)

This would mean adding some special fields for these with limited lengths (how many degrees of precision do we really need for sort? 10-50 characters maybe? and that would limit the growth of the index too much).

It seems like the "manage IDs" presents a couple more options for columns that could be sortable. I believe these are the extra 4.

(I also believe the date-style fields do not need additional indexing.)

sfisher commented 1 month ago

Opensearch has some hidden keyword fields that work for sorting. This all seems to be working now for the two main forms (search, manage).