OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
5k stars 1.25k forks source link

Filter expansion issues with HBase 2.1 #1968

Open ezjrbfjm opened 4 years ago

ezjrbfjm commented 4 years ago

We have found using literal_or OpenTSDB filters were taking ages to complete and HBase nodes CPU utilization were higher than before (running only these type of queries), much higher than using solely regexp or wildcard even. This was not a problem using iliteral_or. Simply switching from literal_or to iliteral_or the query times went down significantly (it was a jaw dropping moment). So we ended up changing tsd.query.filter.expansion_limit=0, which seemingly eliminated the problem (to make it more transparent), but as expected the OpenTSDB instances are fetching way more data than before (on the other hand it stopped killing the backend).

We suspect this could be related to the ColumnPrefixFilter logic change: https://issues.apache.org/jira/browse/HBASE-21620, the suggestion to the reported performance impact was to use more specific HBase filters https://issues.apache.org/jira/browse/HBASE-22448?focusedCommentId=16846481&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16846481 MultipleColumnPrefixFilter in this case.

If it is caused by the above change would it be possible to revise the HBase filters OpenTSDB uses?

manolama commented 3 years ago

We could change it, yes, but we'd have to maintain backwards compatibility and that'll be a bit involved.

How many filters were you passing in a single literal_or filter?

For the change we'd need to:

  1. Add MultipleColumnPrefixFilter to AsyncHBase
  2. Add a converter to change a List<Filter> to the MultipleColumnPrefixFilter when appropriate.

That way the TSD code doesn't have to change and worry about HBase versions.