OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
5k stars 1.25k forks source link

Question on HBase Tuning settings in OpenTSDB docs #1253

Open HariSekhon opened 6 years ago

HariSekhon commented 6 years ago

Hi OpenTSDB folks,

I've been reviewing the recommended HBase settings from the OpenTSDB docs tuning guide here:

http://opentsdb.net/docs/build/html/user_guide/tuning.html

Specifically it recommends:

hbase.rs.cacheblocksonwrite=true
hbase.rs.evictblocksonclose=false
hfile.block.bloom.cacheonwrite=true
hfile.block.index.cacheonwrite=true
hbase.block.data.cachecompressed=true
hbase.bucketcache.blockcache.single.percentage=.99
hbase.bucketcache.blockcache.multi.percentage=0
hbase.bucketcache.blockcache.memory.percentage=.01
hfile.block.cache.size=.054 #ignored but needs a value.

We collect 250,000+ different metrics and I'm concerned that populating the block cache from writes will waste block cache with unpopular metrics compared to others which are heavily queried, and may cause more block cache churn than necessary too.

I wrote a tool to calculate granular rate of requests of individual regions across all regionservers (it's in my PyTools github repo along with a bunch of other HBase tools) which shows that this cluster has 30,000 reads/sec on busiest regions but writes are only 900 writes/sec on busiest regions... so it's my belief that the things that are queried are queried a lot more than the things that are written, especially as we've done analysis on the OpenTSDB queries themselves and found select dashboards are used a lot more and they will be using longer history than a subset of the total metrics. I do expect new data to be read more as all Grafana dashboards on top of OpenTSDB will default to 6 hours, 1 day etc, but isn't it better to just let the memstore be the recent write-read-back cache as intended?

Isn't it better to leave the block cache to do it's own single + multi-access heuristic based on actual reads than pre-populating it from a tonne of metrics (250,000!) which might never be read back?

See also #1244 for more performance tuning details on this cluster.

HariSekhon commented 6 years ago

I did enable these settings, but found that all the get and scan percentiles raised by a few ms, which was clearer looking back over longer history graphs, although not the mutate operations as I would have expected for the extra writes to block cache.

I suspect the following setting is slowing down gets/scans:

hbase.block.data.cachecompressed=true

So I'll set this to false on Monday when I increase the bucket cache from 10GB to 50GB.

manolama commented 5 years ago

I do expect new data to be read more as all Grafana dashboards on top of OpenTSDB will default to 6 hours, 1 day etc, but isn't it better to just let the memstore be the recent write-read-back cache as intended?

It depends as you may want the memstore sized so that it flushes more frequently and there's less time spent replaying WALs on a region server reset. But if you're really seeing that much more queries than writes you can keep the memstore really large. We can only keep a few minutes of data in the memstore so it makes more sense to have the cache. And yes, most of that cache won't be read but at least it's there if we need it. And we get around a 65% hit ratio.

If the cache compression is affecting the percentile time it'd be reflected in a CPU increase on the region server.