OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
5k stars 1.25k forks source link

History Removal Efficiency - HBase TTL vs tsdb scan --delete #1276

Open HariSekhon opened 6 years ago

HariSekhon commented 6 years ago

I was wondering why tsdb scan --delete is usually used instead of HBase TTLs?

I've got a sizeable TSDB HBase cluster currently with 330,000 metrics and several years of history (see also #1275 where I'm trying to find out just how old some of the data is).

I suspect that tsdb scan --delete is there for several reasons:

  1. to be more agnostic since OpenTSDB can have different backends (eg. Cassandra, which also has TTLs)
  2. to follow data's timestamp rather than storage record modified time eg. backfill will have newer modified timestamps in the storage backend than the actual timestamp in the data itself
  3. HBase can only apply this on a CF basis ie. 't' and all metrics contained within it, so you wouldn't be able to have different metrics with longer retention period than that (cell level TTLs even if set by OpenTSDB couldn't exceed CF TTL)

Aside from the points listed above, such as potentially having some backfill stick around longer than expected because it will look at load timestamp rather than the timestamp within the data itself, is there any other major drawback to using HBase TTLs to control data retention instead of tsdb scan --delete?

Iterating over each of 330,000 metrics with a tsdb scan --delete seems like a much more expensive operation that would probably take days and might churn my HBase block cache, likely affecting my cluster performance which is already heavily loaded, a gist example of doing that metric-by-metric can be found here:

https://gist.github.com/dimamedvedev/f45a2a0b092ff9f9f777

Would there be any noticeable performance impact to using HBase TTLs on the cluster? I suspect that it may be an extra op for each read to filter the HBase results against a configured TTL before returning to client, so there is the potential this could add a bit more load to the many reads constantly happening on the cluster? (the majority of our cluster traffic is reads)

Also, OpenTSDB doesn't use any raw scans does it? Doc says it will ignore TTLs (not the end of the world, just curious):

https://hbase.apache.org/book.html#upgrade1.4.rawscan

Perhaps enabling TTLs in the native storage backends of HBase and Cassandra should be recommended in the docs when using those backends if this is more efficient?

manolama commented 5 years ago

Actually it is better to use the TTL instead of deleting. We do that here and it works great. The scan --delete was really to let users determine what data they want to manually delete.