freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
541 stars 149 forks source link

Multi-search API Error: ApiError #3660

Open sentry-io[bot] opened 8 months ago

sentry-io[bot] commented 8 months ago

Ok, here are the first APIErrors that are not related to query parsing errors.

I believe these errors were also present before moving to the Multi-search API. But they were not caught because RequestError was too generic.

Sentry Issue: COURTLISTENER-6F6

Multi-search API Error: ApiError('N/A', meta=ApiResponseMeta(status=200, http_version='1.1', headers={'X-elastic-product': 'Elasticsearch', 'content-type': 'application/vnd.elasticsearch+json;compatible-with=8', 'content-length': '14820'}, duration=21.439457893371582, node=NodeConfig(scheme='https', host='internal-ac39c32f9087245d9b0442d9f04ed408-1221527753.us-west-2.elb.amazonaws.com', port=9200, path_prefix='', headers={'user-agent': 'elasticsearch-py/8.11.1 (Python/3.12.1; elastic-transport/8.11.0)'}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=False, ca_certs='/opt/courtlistener/docker/elastic/ca.crt', client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={})), body={'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'script score function must not produce negative scores, but got: [-7.871904E11]'}, {'type': 'illegal_argument_exception', 'reason': 'script score functio...
mlissner commented 8 months ago

We've got this error about 600 times, so let's try to fix it.

albertisfu commented 8 months ago

Since this is a generic error that can be raised under different circumstances, it was necessary to review the error message of each event to determine all the underlying problems. Therefore, I analyzed all the events associated with this issue, and here are my findings.

From the 775 events analyzed:

The following issues are related to query syntax and parsing errors:

I will continue reviewing these issues and determine which fix could be applied, whether it be cleaning up the query or just adding more details about these failing queries within the error messages.

The following errors seem directly related to a shard failure or cluster overhead at the moment the request occurred. I'm attaching the date and time they occurred in case it is possible to review in Kibana if there was something unusual happening in the cluster at those moments:

mlissner commented 8 months ago

Thanks for the analysis. Sounds like we can table the rest of these as enhancements we can plan to get to later.

Some are just the system being overloaded occasionally or otherwise having a blip, the remainder seem to be good first bugs, since we just need code to detect bad queries and either correct them or throw better errors.

albertisfu commented 8 months ago

I went through the Lexical error, failed to create query and parse_exception: Encountered errors in order to confirm all of those errors are directly related to the user query and not to something bad on the content indexed that needs to be fixed before we start a new re index.

I tested the queries on an empty index so I could confirm none of the errors are associated with the indexed content.

For Lexical error:

lexical_events_only_queries.txt The queries causing this error primarily involve the use of \ or /. For example:

Additionally, unbalanced [] can trigger this error:

These errors should be caught by the current ApiError try/exception with the message: Failed to parse query, avoiding throwing an error since that's the exception I received locally for these queries. But this is not the message we're getting in some cases in prod.

Doing the same query directly to ES it throws:

{
    "error": {
        "root_cause": [
            {
                "type": "query_shard_exception",
                "reason": "Failed to parse query [\"71543\"/\"2023\"]",
                "index_uuid": "kH5lY9uJQ5iQ4KwTi9CCmw",
                "index": "recap_vectors"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "recap_vectors",
                "node": "gkSX2-k-RqqVJRwoWEWnnQ",
                "reason": {
                    "type": "query_shard_exception",
                    "reason": "Failed to parse query [\"71543\"/\"2023\"]",
                    "index_uuid": "kH5lY9uJQ5iQ4KwTi9CCmw",
                    "index": "recap_vectors",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "Cannot parse '\"71543\"/\"2023\"': Lexical error at line 1, column 15.  Encountered: <EOF> after prefix \"/\\\"2023\\\"\" (in lexical state 2)",
                        "caused_by": {
                            "type": "token_mgr_error",
                            "reason": "Lexical error at line 1, column 15.  Encountered: <EOF> after prefix \"/\\\"2023\\\"\" (in lexical state 2)"
                        }
                    }
                }
            }
        ]
    },
    "status": 400
}

The root cause is: Failed to parse query However, within failed_shards, we can see the exception we're encountering in production: Lexical error at line 1...

So the error triggered by the client depends on the response returned by ES, and sometimes this response's structure can vary. Thus, the Failed to parse query message may not be directly shown, but the underlying error remains the same. To catch these errors and prevent them from being sent to Sentry, we need to parse the error message for the specific error type token_mgr_error.

For failed to create query:

failed_query_only_queries.txt

These errors are related to queries with values incompatible with the document field type: "document_number": "452564/2022" "attachment_number":"12/31/2009" "q": "docket_id:v"

For instance, the previous queries fields expect an integer. Passing a different string triggers the failed to create query error.

In this scenario, we could capture the failed to create query: error message and provide a more descriptive error message, indicating that some fields only accept integer values. Alternatively, we could implement a validator at the form level to ensure these fields only accept integers.

Also the incorrect use of some query chars like ~ trigger this error, for instance: Robert N. ~Opel II ->failed to create query: fuzziness cannot be [Opel] James Norris\u201d~3 -> failed to create query: Valid edit distances are [0, 1, 2] but was [3]

Forparse_exception

parse_exception_only_queries.txt

Errors here are also related to the input query. Mostly related to the incorrect use of : like: :Horn Equipment : 24-cv-00210 In Re: FTX Cryptocurrency Exchange Collapse Litigation In re: Chicago Board Options Exchange Volatility Index Manipulation Antitrust

Incorrect usage of advanced syntax clauses like: OR MANIA \"Stolen gun\" AND

And parsing exceptions caused by disallowed characters: "q":"{}" arbitra! Clark v Brown--- F.Supp.3d ----, 2021

Similar to Lexical error, the error message should typically be Failed to parse query, but it appears necessary to parse the entire error message to identify parse_exception. This will enable us to capture these errors and prevent them from being sent to Sentry. While we can still send errors not related to parsing issues like as cluster overload, failed shards and others.

mlissner commented 7 months ago

Cool, sounds like these can wait to be updated down the road. Thanks for the analysis.