JULIELab / gepi

GePI (GEne - Protein Interactions) is a web portal for quick and convenient access to gene - protein interaction mentions automatically extracted from the biomedical literature, i.e. PubMed and PubMed Central (Open Access Subset).
GNU General Public License v3.0
1 stars 0 forks source link

Show actual number of table pages #243

Closed khituras closed 1 year ago

khituras commented 1 year ago

And do not cut at 10k. The cutoff ought to save time but in my tests with GePI its not much of a difference and the more specific information is better.

khituras commented 1 year ago

I tried to just set the tracking number to Integer.MAX_VALUE which seems to work at first (after also setting this to the ES index because it also has a default cutoff at 10k and will return an error). However, the returned count in GePI does not correspond to the count of the same query when I send it to ES via cerebro (direct JSON API usage) but is ca. twice as large. The effect of this are errors when trying to jump to the end. Also, we'd like accurate numbers. Investigate.

khituras commented 1 year ago

The issue was that the elasticsearch query components failed to set the index on the search request which led to a search across all indices. Since a second, clean index was built at that time, there were nearly as twice results.

khituras commented 1 year ago

Decided to leave the 10k restriction on the table. The total number of events is given in the statistics, who need the complete table can download it. There is no reason why the 1000th page is worse than the 2000th page.