elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.74k stars 24.68k forks source link

Rollup search ignores minimum_should_match clause #86505

Closed attilasalyi-seon closed 2 years ago

attilasalyi-seon commented 2 years ago

Elasticsearch Version

8.1.2

Installed Plugins

No response

Java Version

bundled

OS Version

Ubuntu 20.04.1

Problem Description

When querying rollup jobs minimum_should_match is not working (tested in v7.17 and v8.1.2) making should clauses useless.

The rollup job limitations page lists compound queries explicitly as allowed ones, so in theory it should work.

Pastebin link for easier copy+paste reproduction: https://pastebin.com/akpG6cED

Steps to Reproduce

Description

This paste shows that Elasticsearch (at least versions 7.17 and 8.1) ignore minimum_should_match clause in aggregations against rollup indices (using the rollup_search endpoint), despite the official documentation stating otherwise.

The paste creates all necessary components and has a clean up section too.

Create sample index with data

PUT cardata
PUT cardata/_mapping
{"dynamic":"true","properties":{"amount":{"type":"long"},"vendor":{"type":"keyword"},"currency":{"type":"keyword"},"date":{"type":"date"},"car.original_state":{"type":"keyword"}}}

Put data into the sample index

PUT cardata/_doc/1
{"amount":1000,"currency":"USD","vendor":"Ford","date":"2022-04-01T12:10:00Z","car.original_state":"CLEAN"}
PUT cardata/_doc/2
{"amount":5000,"name":"John Smith","currency":"USD","vendor":"Chevrolet","date":"2022-04-02T12:10:00Z","car.original_state":"WRECKED"}
PUT cardata/_doc/3
{"amount":7500,"currency":"HUF","vendor":"Chrysler","date":"2022-04-03T12:10:00Z","car.original_state":"CLEAN"}
PUT cardata/_doc/4
{"amount":65,"currency":"GBP","vendor":"Vauxhall","date":"2022-04-03T12:10:00Z","car.original_state":"WRECKED"}
PUT cardata/_doc/5
{"amount":17,"currency":"GBP","vendor":"Rover","date":"2022-04-03T12:10:00Z","car.original_state":"CLEAN"}

Create and start rollup jobs, run every minute

PUT _rollup/job/car_states
{"index_pattern":"cardata*","rollup_index":"rollup-cardata","cron":"*/1 * * * * ?","page_size":1000,"groups":{"date_histogram":{"field":"date","fixed_interval":"1d"},"terms":{"fields":["currency","amount","car.original_state"]}},"metrics":[{"field":"amount","metrics":["min","max","sum"]}]}
POST _rollup/job/car_states/_start

Query data

Query1: live index data (normal _search)

Query description: give me the amount of CLEAN or UPGRADED cars bought with GBP Result: 1 doc with key GBP and sum value 17.0

GET cardata/_search?size=0
{"query":{"bool":{"filter":[{"term":{"currency":"GBP"}}],"should":[{"term":{"car.original_state":"CLEAN"}},{"term":{"car.original_state":"UPGRADED"}}],"minimum_should_match":1}},"aggs":{"currency_agg":{"terms":{"field":"currency"},"aggs":{"amount_agg":{"sum":{"field":"amount"}}}}}}

Query2: rollup index data (rollup_search)

Query description: same as in Query1 Result: 2 docs with key GBP and sum value 82 (the result a query without should would give)

GET rollup-cardata/_rollup_search?size=0
{"query":{"bool":{"filter":[{"term":{"currency":"GBP"}}],"should":[{"term":{"car.original_state":"CLEAN"}},{"term":{"car.original_state":"UPGRADED"}}],"minimum_should_match":1}},"aggs":{"currency_agg":{"terms":{"field":"currency"},"aggs":{"amount_agg":{"sum":{"field":"amount"}}}}}}

*** ^^^ THIS IS WHERE YOU CAN SEE THAT minimum_should_match is ignored *

Query3: rollup index data (rollup_search), should in separate bool

Query description: same as in Query1 and Query2 Result: same as in Query2 result

GET cardata/_search?size=0
{"query":{"bool":{"filter":[{"term":{"currency":"GBP"}},{"bool":{"should":[{"term":{"car.original_state":"CLEAN"}},{"term":{"car.original_state":"UPGRADED"}}],"minimum_should_match":1}}]}},"aggs":{"currency_agg":{"terms":{"field":"currency"},"aggs":{"amount_agg":{"sum":{"field":"amount"}}}}}}

===================================================

Clean up

Clean up rollup job

POST _rollup/job/car_states/_stop
DELETE _rollup/job/car_states
DELETE rollup-cardata

Logs (if relevant)

No response

elasticmachine commented 2 years ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

wchaparro commented 2 years ago

This issue is about the current experimental version of rollups. The team has decided to implement this functionality from scratch as the downsample operation of the TSDB project (https://github.com/elastic/elasticsearch/issues/74660).

As we focus all our efforts into implementing downsampling for TSDB, the team has decided not to proceed with addressing issues in the current experimental rollups.

Therefore, I am closing this issue.