Open jackluo923 opened 1 year ago
This stackoverflow question suggests ElasticSearch (built on top of Lucene) supports similar substring search feature. https://stackoverflow.com/questions/44791075/in-elasticsearch-how-do-i-search-for-an-arbitrary-substring
cc @atris @siddharthteotia
@jackluo923 does this merged feature solves this issue for you? If so, we could close this issue...
@hpvd We'll likely back out the merged feature and enable a similar feature via run-time configurable analyzer & query parser, as proposed in this PR. There are a couple of reasons for this:
We have been using PR#13003 in production over petabytes of data without issues and will work towards closing the PR, and this issue soon.
awsome! Many thanks for the detailed description of the current state and experience!
We've seen many cases where a user wants to search for a substring in a field with a text index. If all of the tokens in the query are complete words, we can directly use a phrase search:
SELECT * FROM table WHERE text_match("col", '"substring match query"')
However, if the first or last token is a partial word (e.g.,"string match que"
), the query will not return any results. Treating the query as regex text-match query does not work either as Pinot only supports regex match on a single token. To work around the limitation, we can use this a query like this:SELECT * FROM table WHERE text_match("col", '/*string/ AND match AND /que*/') AND "col" LIKE "%string match que%"
But this is very slow and computationally expensive due to the LIKE which is necessary for validating the order of the tokens.