ICIJ / datashare

A self-hosted search engine for documents.
https://datashare.icij.org
GNU Affero General Public License v3.0
596 stars 53 forks source link

Can't see results after the 9,999th one #688

Closed annelhote closed 1 year ago

annelhote commented 3 years ago

Describe the bug On a project with more than 10,000 documents, I can't see search results after the 9,999th.

To Reproduce Steps to reproduce the behavior:

  1. On our Datashare staging, open this page.
  2. Click on next page
  3. You will see this error message : " The server encountered a problem "

Expected behavior I would like to see the next results

Additional context The HTTP request to the API get an 500 error with message "java.lang.IllegalStateException: Unable to apply route". While trying to reproduce the ES query I get this error message :

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "INDEX",
        "node" : "NODE",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
      }
    }
  },
  "status" : 400
}

We'd better use the scroll API instead of this from and size query.

annelhote commented 3 years ago

The current best option seems to be search_after

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 40 days with no activity.

pirhoo commented 1 year ago

After careful consideration, we decided not to implement the search_after parameter.

Even it allows to search above 10,000 results, it would have forced us to remove important features, such as: