HEPData / hepdata

Repository for main HEPData web application
https://hepdata.net
GNU General Public License v2.0
40 stars 11 forks source link

search: allow searching in a range of INSPIRE IDs or publication record IDs #791

Open GraemeWatt opened 4 months ago

GraemeWatt commented 4 months ago

As a workaround for #390 where there is a limit of 10k publications in search results, searches to get all records could be split into multiple searches using an identifier like the inspire_id or the recid, as suggested for INSPIRE-HEP.

However, searching for a range like inspire_id:[1 TO 10000] does not currently work for HEPData because the inspire_id is stored as text in the index (see also #301). A custom query should be implemented that converts the inspire_id to an integer.

Moreover, searching for a range like recid:[100 TO 200] returns records where the table recid is within the range even if the publication recid is not. This means it would not be possible to get all records (avoiding duplicates) from a combination of recid searches. Implementing a new search query like publication_recid:[100 TO 200] that does not match the table recid would be useful to avoid this problem.