Coprocessor for compaction?

tesseract2048 commented 7 years ago

Compaction is performed within TSD instances in current implementation. An alternative is implementing HBase coprocessor for compaction. This can substantially reduce both CPU and I/O cost without downgrading performance (verified in production).

Any thoughts?

manolama commented 7 years ago

Hi @tesseract2048 Thanks for your question. We've looked at implementing co-processors for compaction and while it definitely helps with IO and CPU on the TSDs, depending on implementation, it can still swamp the region server at the top of each hour.

The best implementation may be a co-processor that could simply compact when the region is compacted (either major or minor) though that also has some impact as it would have to flush cached blocks from memory.

We also implemented HBase Appends in TSDB 2.2 so that the CPU and IO load is evenly distributed over time during writes, though it does incur a fair amount of overhead on the region server side. At Yahoo, we have an HBase engineer working on an improved append version that has reduced the load quite a bit and is similar to the compacting co-processor.

That said, if you have some co-processor code for compactions that you would be willing to open source, I'd love to include it in the project so folks could use it!

tesseract2048 commented 7 years ago

Thanks for reply, @manolama . I have implemented a version with RegionObserver interface that overrides scanners used during region compaction (major and minor). Worked in production as expected for several weeks. Putting it together would be next step if people are willing to use it.

Another thought (or should I open another issue?) is leveraging region coprocessor for distributed aggregation for queries, since we have seen some 'big' queries that scan over lots of data, either with a large time range or too many spans (tags). In that case, bottleneck is bandwidth (transferring scan results) and CPU (aggregation and downsampling). Coprocessor provides better data locality (no intermediate results transferred, and HDFS data blocks are also local in most cases), and speeds up aggregation & downsampling by parallelization. Observed 2x~18x speed up when salt bucket is 20. And speed up ratio is larger for bigger queries. Pitfall is aggregated result may be different due to different strategy on handling missing DP in span (i.e. in distributed aggregation, if a datapoint on specific TS is missing in whole split, interpolating it in the sequential stage is no longer trivial). Can you share some thoughts on this? Like, in Yahoo, did folks solve it? How?

OpenTSDB / opentsdb

Coprocessor for compaction? #901