apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Support log search functionality in Pinot #2798

Open ananthdurai opened 6 years ago

ananthdurai commented 6 years ago

Pinot provides support to pick and choose to index for each column. Adding the support for log search indexing will help some of the use cases like error stack trace indexing etc. Most of the finding the needle in the haystack use cases starts with a free text search and then drill down analysis. Adding support for the free text search can enable Pinot to address the extended use cases.

kgopalakrishna commented 5 years ago

Can you add more details on the type of operations (exact match, prefix match and boolean operations) that can be performed on these columns?. Do you need everything that lucene provides?

ananthdurai commented 5 years ago

Thank you @kishoreg. I think the scope of the search capability can be limited to the following, and I'm assuming the column error_logs is the full text indexed column/

  1. prefix Query ( error_logs = "error*")
  2. wildcard Query (error_logs = "apache*error")
  3. regex Query ( error_logs = "error.*access")
  4. Fuzzy Query (fuzzy_search(term="apache", fuzziness = 3))
  5. Match Phrase Query ( error_logs = "error") The proposal assumes that users may choose some column to be indexed as free text.
mcvsubbu commented 5 years ago

We do support regexp_like(). See https://pinot.readthedocs.io/en/latest/pql_examples.html

ananthdurai commented 5 years ago

TIL: Let me run some performance comparison study on it.

mayankshriv commented 5 years ago

@mcvsubbu The regex_like support is very naive at the moment. IIRC, it basically does a regex match on all dictionary items to collect matching dictionary ids, and select records with matching dict id.

From @ananthdurai ask, it seems more like a ELK use case that requires a lot more faster text indexing/search, which we currently don't have.

ananthdurai commented 5 years ago

@mayankshriv Yes, that is correct. Ideally, if Pinot can support multiple indexing formats, (log search and the columnar indexing) and the query engine intelligent enough to choose the indexes based on the query pattern, that would be awesome.

mayankshriv commented 5 years ago

@ananthdurai Agreed, that would be a good feature to have. @sunithabeeram was experimenting with it with @kishoreg, will let them comment on how that is panning out.

ananthdurai commented 5 years ago

Adding a couple of links from DM for future references, https://www.quora.com/What-is-the-algorithm-used-by-Lucenes-PrefixQuery https://issues.apache.org/jira/browse/LUCENE-1606

israbbani commented 5 years ago

+1