Open ananthdurai opened 6 years ago
Can you add more details on the type of operations (exact match, prefix match and boolean operations) that can be performed on these columns?. Do you need everything that lucene provides?
Thank you @kishoreg. I think the scope of the search capability can be limited to the following, and I'm assuming the column error_logs
is the full text indexed column/
We do support regexp_like(). See https://pinot.readthedocs.io/en/latest/pql_examples.html
TIL: Let me run some performance comparison study on it.
@mcvsubbu The regex_like support is very naive at the moment. IIRC, it basically does a regex match on all dictionary items to collect matching dictionary ids, and select records with matching dict id.
From @ananthdurai ask, it seems more like a ELK use case that requires a lot more faster text indexing/search, which we currently don't have.
@mayankshriv Yes, that is correct. Ideally, if Pinot can support multiple indexing formats, (log search and the columnar indexing) and the query engine intelligent enough to choose the indexes based on the query pattern, that would be awesome.
@ananthdurai Agreed, that would be a good feature to have. @sunithabeeram was experimenting with it with @kishoreg, will let them comment on how that is panning out.
Adding a couple of links from DM for future references, https://www.quora.com/What-is-the-algorithm-used-by-Lucenes-PrefixQuery https://issues.apache.org/jira/browse/LUCENE-1606
+1
Pinot provides support to pick and choose to index for each column. Adding the support for log search indexing will help some of the use cases like error stack trace indexing etc. Most of the finding the needle in the haystack use cases starts with a free text search and then drill down analysis. Adding support for the free text search can enable Pinot to address the extended use cases.