Open iksaif opened 8 years ago
Current compression is {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
Keyspace : biggraphite
Read Count: 879862
Read Latency: 0.1316408550431772 ms.
Write Count: 667165100
Write Latency: 0.014124243819108644 ms.
Pending Flushes: 0
Table: datapoints_11520p_60s
Space used (live): 2.61 GiB
Space used (total): 2.61 GiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 28.6 MiB
SSTable Compression Ratio: 0.49022149548676547
Number of keys (estimate): 881629
Memtable cell count: 653469
Memtable data size: 36.34 MiB
Memtable off heap memory used: 20.5 MiB
Memtable switch count: 281
Local read count: 463900
Local read latency: 0.096 ms
Local write count: 230203395
Local write latency: 0.016 ms
Pending flushes: 0
Bloom filter false positives: 42
Bloom filter false ratio: 0.00000
Bloom filter space used: 5.92 MiB
Bloom filter off heap memory used: 5.92 MiB
Index summary off heap memory used: 1.57 MiB
Compression metadata off heap memory used: 633.61 KiB
Compacted partition minimum bytes: 61
Compacted partition maximum bytes: 11864
Compacted partition mean bytes: 1153
Average live cells per slice (last five minutes): 4.470668247467127
Maximum live cells per slice (last five minutes): 1109
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0 bytes
$ nodetool tablehistograms biggraphite datapoints_11520p_60s
biggraphite/datapoints_11520p_60s histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 3.00 14.24 29.52 535 35
75% 3.00 17.08 42.51 1109 72
95% 14.00 29.52 152.32 6866 446
98% 17.00 42.51 545.79 6866 535
99% 20.00 61.21 1358.10 6866 535
Min 0.00 1.92 1.92 61 2
Max 24.00 223875.79 223875.79 11864 770
The compression ratio also seems to be 0.5 for most of the other tables.
We started to design such system but on HBase. We used GZip compression and achieved 0.8 compression rate (we store 2 quality fields that might explain the difference).
About the readme I have a question:
This saves space in two ways:
No need to repeatedly store metric IDs for each entry The relative offset is only 4 bytes, while timestamps are 8 bytes
I know much better HBase than Cassandra, but they should be pretty similar: for each column the rowkey is written again and again. Not one rowkey for many columns.
But I just checked a bit and it seems that you can use "COMPACT STORAGE" when creating the table to save the place as you said (not see in your create table statement). https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis
Let me know if I missed something :-)
I believe most of this has been fixed since C* 3.0 (see http://www.datastax.com/2015/12/storage-engine-30 for details). I would have to check exactly how many bytes per point we currently use, but last time I checked it was better than whisper, which was our current requirement.
Double-delta encoding might help here, because it could drastically reduce the number of points we store (we could, for example, skip up to 5 points if the double-delta stays the same).
Currently our main limitation is really the number of mutations per seconds and the load this puts on Cassandra (currently limited at 70k/s per node).
Thanks for the feedback! Didn't know whisper before, we want to keep the same accuracy even 20 years in the past.
We don't use delta encoding but our pre processing send data only when there is a significant change in the values.
Seems HBase is also around the same number of mutations/s.
Does this mean that you interplate results if there is nothing in the database ? How do you deal with a client that would like to see the last 2-3 minutes of data ?
Depending on the sensors system we either repeat the same value or do a linear interpolation (as Data Historian do).
For now we don't have a real time workflow. Our industrial site buffers data and send them few times per hour as a file.
Ok, that's slightly easier in this case. Thanks for the details !