Bad benchmarks on initial test

tamoyal commented 6 years ago

I've setup an initial test to see how an OK server would do with a decent load and the results don't look so I thought I would share them here and get some feedback on what I'm (probably) doing wrong.

First thing to address would be storage footprint since that is relatively easy to compare apples to apples. The docs say Chronix will take 5-171 times less space (I assume this is compared to CSV or some relatively raw/simple data format). My data rows look like this:

18,25,8547.736954352933,1523318400.973
2,43,1980.6639051377774,1523318401.176
17,69,9500.832828241511,1523318402.991
13,12,1442.8229313976187,1523318403.377
8,66,4088.2959033363563,1523318404.812
5,84,5772.630417804327,1523318405.002
1,54,7276.800267948981,1523318406.934

After importing 16mm of these, I saw a storage footprint of 2.3GB. This is way over than what I expected. I have this data stored in CSV format at a rate of about 50mm data points in 2.2 GB without compression. Does this seem fishy?

The second thing to address is insert rate. I've tried writing in batches of 100, 1000, 2500, 5000 using the golang lib (s_err := c.Store(series, true, time.Second)). The best I have seen was using a single thread to write 5000 data point batches at which point I was able to ingest ~ 800 datapoints/second. After this I tried writing with 10 threads and 20 threads in batches of 1000 and 2500 and max out around ~500 data points / second.

Now I'm on a digital ocean server with 2 VCPU's, 4GB RAM, and Ubuntu but this still seems fishy based on some comparisons. Actually I'm not to worried about the write speed for my specific use case but it would be worth touching on as I'll probably want to be inserting at ~100-1000 data points / second. Is chronix/solr slow at inserts but fast at queries (is that the tradeoff?).

Also I was definitely expecting a lighter storage footprint based on the claim. Am I doing something wrong?

FlorianLautenschlager commented 6 years ago

Hi @tamoyal,

First thing to address would be storage footprint since that is relatively easy to compare apples to apples. The docs say Chronix will take 5-171 times less space (I assume this is compared to CSV or some relatively raw/simple data format). My data rows look like this:

Yes, is compared to simple csv files. This paper at Chronix describes also a quantitative evaluation.

After importing 16mm of these, I saw a storage footprint of 2.3GB. This is way over than what I expected. I have this data stored in CSV format at a rate of about 50mm data points in 2.2 GB without compression. Does this seem fishy?

This is no what i expect. How many points do you store in a time series? It seems that the chunks (n points in a solr document) are too small. The example dataset chronix example data has around 76,439,668 points and 5,836 time series. It needs compressed around 56 MB and stored in Chronix with the chronix importer (examples project, java) around 120 MB.

The second thing to address is insert rate. I've tried writing in batches of 100, 1000, 2500, 5000 using the golang lib (s_err := c.Store(series, true, time.Second)). The best I have seen was using a single thread to write 5000 data point batches at which point I was able to ingest ~ 800 datapoints/second. After this I tried writing with 10 threads and 20 threads in batches of 1000 and 2500 and max out around ~500 data points / second.

The chronix-importer writes 1415549 points / second (one commit at the end of the import), while a chunk has around 7821 points. I have to use the go lib to import the time series and see whats going on...

Also I was definitely expecting a lighter storage footprint based on the claim. Am I doing something wrong?

This could be of too many commits with expensive index creations.

tamoyal commented 6 years ago

@FlorianLautenschlager thanks for the response and sorry for the delay, i was traveling

This is no what i expect. How many points do you store in a time series?

Not sure I understand this question but I think they are all part of one time series in all data points are associated with the same Name.

It seems that the chunks (n points in a solr document) are too small.

Ok - I used the default installation params. So I need to tune to take advantage of the compression? Can you suggest a setting for this?

while a chunk has around 7821 points

By chunk do you mean the write batch size as in you commit every ~7821 data points?

This could be of too many commits with expensive index creations.

What defines an expensive vs cheap index creation?

FlorianLautenschlager commented 6 years ago

@FlorianLautenschlager thanks for the response and sorry for the delay, i was traveling

No problem. I also took the privilege of taking the weekend off. ;-)

Not sure I understand this question but I think they are all part of one time series in all data points are associated with the same Name.

This was misleadingly written. In Chronix a time series is composed of a bunch of so called records. A record is stored in a solr / lucene document. A record contains alle the attributes (tags), start, end, name, type and the data field. The data field contains the chunk of data (7281 in my example).

By chunk do you mean the write batch size as in you commit every ~7821 data points?

See above. Each record contains a chunk with 7821 data points in my example. Note that Chronix is heavily read optimised.

What defines an expensive vs cheap index creation?

Chronix stores everything in Apache Solr and every solr commit is blocking. Thus do a batch import with a single commit afterwards (if possible).

ChronixDB / chronix.server

Bad benchmarks on initial test #148