Cannot be filtered out of '## xx' using the Querry syntax

wangchao732 commented 1 month ago

Elasticsearch Version

8.14

Installed Plugins

No response

Java Version

1.8.0_191

OS Version

liunx centos 7.9

Problem Description

~XD~VGD OZQC_UN~HOI6J9D XCR`PVFTD7K2$1TD()G7Z0A

Steps to Reproduce

null

Logs (if relevant)

No response

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 1 month ago

Could you provide text about whats happening, what you expect to happen, and how to reproduce the problem?

wangchao732 commented 1 month ago

Could you provide text about whats happening, what you expect to happen, and how to reproduce the problem?

For example, if I query today's error log information, I can find it in Kibana, but the query syntax can only match one? F5F5F4AF-5636-4815-AA3F-897A1619EBAA

wangchao732 commented 1 month ago

"2024-07-17 10:48:00.908 ERROR 1 --- [io-8080-exec-71] u.t.b.d.config.GlobalExceptionHandler : JSON parse error: Cannot deserialize value of type java.lang.Double from String \"未上报\": not a valid Double value; nested exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type java.lang.Double from String \"未上报\": not a valid Double value"

"2024-07-17 08:19:37.157 ERROR 1 --- [TB-Scheduling-1] o.t.server.dao.service.DataValidator : Asset object is invalid: [Asset is referencing to non-existent tenant!]"

Why do these two pieces of information "object is invalid" do not match, and the result is that the other information also does not match?

benwtrent commented 1 month ago

@wangchao732 You can see why a particular document matches with a detailed explanation if you use the explain API.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html#:~:text=The%20explain%20API%20computes%20a,t%20match%20a%20specific%20query.

This should help you debug.

Additionally, to see the completely rewritten Lucene query that is being executed, you can pass your query validate API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

With the parameter rewrite: true it will provide you with the fully rewritten query.

These two APIs should get you much more information about why a query matches a particular document.

wangchao732 commented 1 month ago

Asset object

curl -XGET -H 'Content-Type: application/json' -u xxx "https://xxx:9200/k8slog-2024.07.17/_explain/AF8hvpABVEXVlDRloT9m?pretty" --insecure -d '{"query": { "bool" : { "must": [{"match": {"fields.namespace": "dapr-application" }},{"match": { "message": "ERROR" }}], "must_not": [ {"match": {"message": "WARN"}},{"match": {"message": "DEBUG"}},{"match": {"message": "INFO"}},{"match": {"message": "object is invalid"}},{"match": {"message": "adThread"}}]} }}'

{ "_index" : "k8slog-2024.07.17", "_id" : "AF8hvpABVEXVlDRloT9m", "matched" : false, "explanation" : { "value" : 0.0, "description" : "Failure to meet condition(s) of required/prohibited clause(s)", "details" : [ { "value" : 0.25241855, "description" : "sum of:", "details" : [ { "value" : 0.1262083, "description" : "weight(fields.namespace:dapr in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.1262083, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 0.12774244, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 11779941, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13385079, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.44908655, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 2.0, "description" : "dl, length of field", "details" : [ ] }, { "value" : 1.9422878, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] }, { "value" : 0.12621024, "description" : "weight(fields.namespace:application in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.12621024, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 0.12774439, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 11779918, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13385079, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.44908655, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 2.0, "description" : "dl, length of field", "details" : [ ] }, { "value" : 1.9422878, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] } ] }, { "value" : 7.267964, "description" : "weight(message:error in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 7.267964, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 5.1235533, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 78753, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13225155, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.64479077, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 60.0, "description" : "dl, length of field (approximate)", "details" : [ ] }, { "value" : 215.23318, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] }, { "value" : 0.0, "description" : "match on prohibited clause (message:object message:is message:invalid)", "details" : [ { "value" : 1.0, "description" : "message:object message:is message:invalid", "details" : [ ] } ] } ] } }

curl -XGET -H 'Content-Type: application/json' -u xxx "https://xxx/k8slog-2024.07.17/_explain/AF8hvpABVEXVlDRloT9m?pretty" --insecure -d '{"query": { "bool" : { "must": [{"match": {"fields.namespace": "dapr-application" }},{"match": { "message": "ERROR" }}], "must_not": [ {"match": {"message": "WARN"}},{"match": {"message": "DEBUG"}},{"match": {"message": "INFO"}},{"match": {"message": "Asset object"}},{"match": {"message": "adThread"}}]} }}'

{ "_index" : "k8slog-2024.07.17", "_id" : "AF8hvpABVEXVlDRloT9m", "matched" : true, "explanation" : { "value" : 7.5192084, "description" : "sum of:", "details" : [ { "value" : 0.25261974, "description" : "sum of:", "details" : [ { "value" : 0.12630892, "description" : "weight(fields.namespace:dapr in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.12630892, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 0.12784426, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 11791710, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13399816, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.44908655, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 2.0, "description" : "dl, length of field", "details" : [ ] }, { "value" : 1.9422879, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] }, { "value" : 0.12631084, "description" : "weight(fields.namespace:application in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.12631084, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 0.12784621, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 11791687, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13399816, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.44908655, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 2.0, "description" : "dl, length of field", "details" : [ ] }, { "value" : 1.9422879, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] } ] }, { "value" : 7.2665887, "description" : "weight(message:error in 3921793) [PerFieldSimilarity], result of:", "details" : [ { "value" : 7.2665887, "description" : "score(freq=1.0), computed as boost idf tf from:", "details" : [ { "value" : 2.2, "description" : "boost", "details" : [ ] }, { "value" : 5.122982, "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details" : [ { "value" : 78885, "description" : "n, number of documents containing term", "details" : [ ] }, { "value" : 13239758, "description" : "N, total number of documents with field", "details" : [ ] } ] }, { "value" : 0.6447407, "description" : "tf, computed as freq / (freq + k1 (1 - b + b dl / avgdl)) from:", "details" : [ { "value" : 1.0, "description" : "freq, occurrences of term within document", "details" : [ ] }, { "value" : 1.2, "description" : "k1, term saturation parameter", "details" : [ ] }, { "value" : 0.75, "description" : "b, length normalization parameter", "details" : [ ] }, { "value" : 60.0, "description" : "dl, length of field (approximate)", "details" : [ ] }, { "value" : 215.12988, "description" : "avgdl, average length of field", "details" : [ ] } ] } ] } ] } ] } }

Sorry, With the explain , I can't read the output.

benwtrent commented 1 month ago

@wangchao732 you changed which doc you were matching between values. I cannot easily follow your concerns. Mixing images & poorly formatted text makes all this unnecessarily difficult. I am guessing you want to know why a single doc doesn't match a single query?

What is the body of that doc that doesn't match the single query?

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "fields.namespace": "dapr-application"
                    }
                },
                {
                    "match": {
                        "message": "ERROR"
                    }
                }
            ],
            "must_not": [
                {
                    "match": {
                        "message": "WARN"
                    }
                },
                {
                    "match": {
                        "message": "DEBUG"
                    }
                },
                {
                    "match": {
                        "message": "INFO"
                    }
                },
                {
                    "match": {
                        "message": "Asset object"
                    }
                },
                {
                    "match": {
                        "message": "adThread"
                    }
                }
            ]
        }
    }
}

You can do a validate with rewrite with your query to see the fully rewritten lucene query that would be ran (with text analysis and everything). This will show you what is executed and hopefully show you whats happening.

Please do that. "match": {"message": "object is invalid"} is likely being rewritten to three term queries, separated by an OR. But, it depends on the analyzer, etc. being used.

As for the explain, it shows that the first doc didn't match as it was excluded. The second doc did.

wangchao732 commented 1 month ago

What is the body of that doc that doesn't match the single query?

yes, I'm using bool to query when matching message documents that contain ERROR but don't contain "object is invalid", The message showed an ERROR and did not appear "object is invalid" but was excluded, but I changed the way to match the message containing the ERROR but not the "Asset object" and got the desired result.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elastic / elasticsearch