apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Text search on the dimension in star-tree #12219

Open zhuangdaz opened 8 months ago

zhuangdaz commented 8 months ago

Wonder if we can add support of text searching on the dimension in star-tree index.

Use case: We have a table that has cols: dateInt, url, impressions. And we are looking to create a star-tree index like:

"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "url",
      "dateInt"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__impressions"
    ],
    "maxLeafRecords": 10000
  }],
  ...
}

And we'd want to filter on the url to get the aggregation results, for example, only returning the sum(impressions) for urls that contain pinot.

Query would be like:

select url, sum(impressions)
from table
where text_match('url', 'pinot'), dateInt = 20240104
group by url
zhuangdaz commented 8 months ago

related to: https://github.com/apache/pinot/issues/8863

Jackie-Jiang commented 8 months ago

text_match requires text index, which doesn't work with star-tree index. Can you try LIKE or RegexpLike instead and see if it works?