criteo / biggraphite

Simple Scalable Time Series Database
Apache License 2.0
129 stars 36 forks source link

Experiment with compression and other ways to save space #129

Open iksaif opened 7 years ago

iksaif commented 7 years ago
iksaif commented 7 years ago

Current compression is {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

Keyspace : biggraphite
        Read Count: 879862
        Read Latency: 0.1316408550431772 ms.
        Write Count: 667165100
        Write Latency: 0.014124243819108644 ms.
        Pending Flushes: 0
                Table: datapoints_11520p_60s
                Space used (live): 2.61 GiB
                Space used (total): 2.61 GiB
                Space used by snapshots (total): 0 bytes
                Off heap memory used (total): 28.6 MiB
                SSTable Compression Ratio: 0.49022149548676547
                Number of keys (estimate): 881629
                Memtable cell count: 653469
                Memtable data size: 36.34 MiB
                Memtable off heap memory used: 20.5 MiB
                Memtable switch count: 281
                Local read count: 463900
                Local read latency: 0.096 ms
                Local write count: 230203395
                Local write latency: 0.016 ms
                Pending flushes: 0
                Bloom filter false positives: 42
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 5.92 MiB
                Bloom filter off heap memory used: 5.92 MiB
                Index summary off heap memory used: 1.57 MiB
                Compression metadata off heap memory used: 633.61 KiB
                Compacted partition minimum bytes: 61
                Compacted partition maximum bytes: 11864
                Compacted partition mean bytes: 1153
                Average live cells per slice (last five minutes): 4.470668247467127
                Maximum live cells per slice (last five minutes): 1109
                Average tombstones per slice (last five minutes): 1.0
                Maximum tombstones per slice (last five minutes): 1
                Dropped Mutations: 0 bytes

$ nodetool tablehistograms biggraphite  datapoints_11520p_60s
biggraphite/datapoints_11520p_60s histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             3.00             14.24             29.52               535                35
75%             3.00             17.08             42.51              1109                72
95%            14.00             29.52            152.32              6866               446
98%            17.00             42.51            545.79              6866               535
99%            20.00             61.21           1358.10              6866               535
Min             0.00              1.92              1.92                61                 2
Max            24.00         223875.79         223875.79             11864               770

The compression ratio also seems to be 0.5 for most of the other tables.

csalperwyck commented 7 years ago

We started to design such system but on HBase. We used GZip compression and achieved 0.8 compression rate (we store 2 quality fields that might explain the difference).

About the readme I have a question:

This saves space in two ways:

No need to repeatedly store metric IDs for each entry The relative offset is only 4 bytes, while timestamps are 8 bytes

I know much better HBase than Cassandra, but they should be pretty similar: for each column the rowkey is written again and again. Not one rowkey for many columns.

But I just checked a bit and it seems that you can use "COMPACT STORAGE" when creating the table to save the place as you said (not see in your create table statement). https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis

Let me know if I missed something :-)

iksaif commented 7 years ago

I believe most of this has been fixed since C* 3.0 (see http://www.datastax.com/2015/12/storage-engine-30 for details). I would have to check exactly how many bytes per point we currently use, but last time I checked it was better than whisper, which was our current requirement.

Double-delta encoding might help here, because it could drastically reduce the number of points we store (we could, for example, skip up to 5 points if the double-delta stays the same).

Currently our main limitation is really the number of mutations per seconds and the load this puts on Cassandra (currently limited at 70k/s per node).

csalperwyck commented 7 years ago

Thanks for the feedback! Didn't know whisper before, we want to keep the same accuracy even 20 years in the past.

We don't use delta encoding but our pre processing send data only when there is a significant change in the values.

Seems HBase is also around the same number of mutations/s.

iksaif commented 7 years ago

Does this mean that you interplate results if there is nothing in the database ? How do you deal with a client that would like to see the last 2-3 minutes of data ?

csalperwyck commented 7 years ago

Depending on the sensors system we either repeat the same value or do a linear interpolation (as Data Historian do).

For now we don't have a real time workflow. Our industrial site buffers data and send them few times per hour as a file.

iksaif commented 7 years ago

Ok, that's slightly easier in this case. Thanks for the details !