Open aw381246 opened 8 months ago
This is similar to #12230, and this behavior is expected because of how pinot execute queries. We have built VALUE_IN
to work around this problem
@Jackie-Jiang, if I understand VALUE_IN
, I could specify a specific "host" in the query above, and I'd only get the first row, and the distinct count would = 1 right? But that doesn't solve the problem of returning every host with the correct distinctcountmv? In other words, if "1.27.12.151" is in an MV field with either 0 or 100 other hosts, the distinctcountmv should return 1 either way.
I don't fully follow. What do you want to achieve with the query?
If you group by the same column that is in the distinctcount, every row in the result should have a distinctcount of 1. The only reason pinot is returning a value greater than 1 is because the column is an MV column that has 2 values in this case. Pinot unnests the MV column when you group by it, but the distinctcount seems to be happening before the unnest instead of after the unnest.
Here's a simple example I created in SQL Server:
Pinot MV
semantic works as following:
VALUE_IN
is built as a workaround to do extra filter during project
If the field passed into the distinctcountmv function is included in the list of select / group by fields, it will return incorrect counts.
In the case below, the result for each row should be a distinctcount = 1 instead of 2.
If the array has n number of items, the distinctcount will return n instead of 1