GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.33k stars 313 forks source link

Adaptive tiered storage #1379

Open v0y4g3r opened 1 year ago

v0y4g3r commented 1 year ago

What type of enhancement is this?

Performance

What does the enhancement do?

Background

The memtable capacity must be reduced if we want to accommodate a large number of tables in GreptimeDB, which in turn results into too many small SST files in level 0. This not only degrades performance but also may reach the API rate limit when all SST files are written to OSS.

Tiered storage

We can set a threshold level, where all small SST levels under that threshold is stored on local disk, only SST files above level <threshold> will be uploaded to OSS.

That is to say, we reduce OSS API invocation by postponing the "mark WAL obsolete" from "after flush" to "compacting to <threshold>".

Benifits and deficiencies

Tiered storage proposal can help to reduce OSS API cost, also it makes large number of tables in single datanode possible. A extra benefit would be compression since SST files reside in higher levels can be more adequately compressed.

But it also brings some issues. For example, it turns all SST files beneath level <threshold> volatile, the local storage is immediately reclaimed when pod is destroyed. We need to either:

Implementation challenges

The primary problem is, what kind of tables should enable tiered storage and which level should the threshold be?

More local SST files means more EBS cost but less OSS API invocation and higher compression rate. We can do a simple math and there must be a break-even point. For simplicity, we can maintain a stat of the size of each flush/compaction that happens in each level. If average output size of each flush/compaction is lower than some threshold, then this level remains in local disk.

killme2008 commented 1 year ago

mount an EBS to store SST files beneath or postpone the "mark WAL obsolete" operation from "after flush" to "compacting to " And both approach requires a larger local disk capacity.

I prefer the second solution, it's much more friendly to dev operations.

evenyag commented 1 year ago

We could also consider using a hybrid strategy in the future: