go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
802 stars 123 forks source link

cwhisper files larger in size post conversion if they haven't been updated recently #314

Closed iain-buclaw-sociomantic closed 4 years ago

iain-buclaw-sociomantic commented 4 years ago

Maybe related to #309, however I was surprised to find that the convert process for some metrics increased the size of the wsp file by almost double in some cases.

When looking closer at it, the files that increased in size were infact metric files that had not been written to for nearly 3 months.

It is surprising that this would be the case, that the apparent act of rolling up data during conversion would cause such a steep increase compared to metrics that are updated regularly, particularly as those are the files that will have less datapoints to store now.

Perhaps there's a problem in the convert program, but there would otherwise appear to be a want for some compaction tool as well.

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic would you be able to share some uncompressed whisper files that grew after conversion? Changed the filename to something random and we won't be able to guess its meaning.

I need a closer to look to draw a conclusion if it's a bug or not. So if possible, send it to my private email address: xiaofan.xhu@gmail.com . If it's not sensitive, you could attach it in this issue.

Worst case scenario that I have seen is 13.25% bigger in file size, comparing to uncompressed files, and the dataset is random. I'm curious what's causing it to almost double in size.

iain-buclaw-sociomantic commented 4 years ago

31010.tar.gz

@bom-d-van - here's one of the extreme edge cases. A metric with only two datapoints in the past from around Nov 2018.

compressed:                true
aggregation_method:        average
max_retention:             94608000
x_files_factor:            0.000000
comp_version:              1
points_per_block:          7200
avg_compressed_point_size: 2.000000
crc32:                     68B1243F
archives:                  5
archives.0.retention:      30s:3d
archives.1.retention:      5m:30d
archives.2.retention:      1h:90d
archives.3.retention:      4h:1y
archives.4.retention:      1d:3y

Original size (31010.wsp): 268k Size after running convert (31010.cwsp): 1116k

As a test, I created a new metric by sending the metrics to go-carbon

$ nc 127.0.0.1 2003
31010 5 1543449600
31010 5 1543478400

This resultant file (31010.cwsp.new) is 172k. This is closer to what I had expected to see after running convert.

bom-d-van commented 4 years ago

Aha, 2 data points are probably the cause. I think I might have missed to cover cases like that. Our production payload is denser in general so never have to deal this type of issue.

Thanks for your help and feedback! I should be able to fix it fairly soon. :)

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van - If you want another test where we've had a constant-running metric since 2016, can send that too, which is around 2.5x bigger than the original uncompressed whisper file.

It'll take a little while to produce that file from scratch by resending every data point, just to see if that too is bigger/smaller.

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic it would be great if you could make it. I could make the fix at the same time. :D

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic

Shared my fix here, not merged yet. If you can send more cases for me to examine, I could try to fix them at the same time.

https://github.com/go-graphite/go-whisper/pull/8/files

This should affects only cmd/convert.go and buckytools rebalancing, not extension.

With the new fix, the cwhisper file size for your case is 35.8% smaller.

du -sh /Users/xhu/Downloads/31010/31010.wsp.cwsp /Users/xhu/Downloads/31010/31010.wsp
172K    /Users/xhu/Downloads/31010/31010.wsp.cwsp
268K    /Users/xhu/Downloads/31010/31010.wsp
iain-buclaw-sociomantic commented 4 years ago

value.tar.gz

Thanks, I'll give it a test. I'm currently in the middle of reverting the conversion to compressed whisper, as the total disk usage for the storage node has increased by 100GB.

Attaching the second example as per my promise, contents are:

value.wsp (268K): original whisper metric value.cwsp (636k): compressed whisper metric after convert

I did have a try at creating a new whisper metric via go-carbon using the same datapoints and values, however, the resultant file was always incomplete, even so the incomplete metric was still 404k.

Again, it is surprising that it should be allowed to grow bigger than the uncompressed whisper format. I would have expected that the size would have been bounded to whatever is the retention set for the archive.

I have recently also imported our entire whisper cluster into victoria metrics, which goes one further than gorilla by first converting floating point values into integers by multiplying by 10^N (N being the number of decimal digits), then using zstd on-top to reduce it further. Perhaps I'd be comparing apples and oranges, but the compaction was as such that 1.5TB of whisper data became 63GB of victoria data. So I have at least an expected of how much free space I can reclaim. ;-)

bom-d-van commented 4 years ago

Thanks, I'll give it a test. I'm currently in the middle of reverting the conversion to compressed whisper, as the total disk usage for the storage node has increased by 100GB.

Sorry to hear that. I think if both the fixes are applied, it should make things smaller.

Attaching the second example as per my promise, contents are:

value.wsp (268K): original whisper metric value.cwsp (636k): compressed whisper metric after convert

This is a case related to its retention policy configuration, because by default cwhisper files would allocate 7200 data points per block and each archive would have at least two blocks. And the last three archives in your whisper file all has less than 3000 data points. I should be able to fix that by using a smaller block size.

Again, it is surprising that it should be allowed to grow bigger than the uncompressed whisper format. I would have expected that the size would have been bounded to whatever is the retention set for the archive.

Yep, your whisper file format is quite different from our production setup, so the implementation doesn't handle it very well. But most of the issues should be fixable.

but the compaction was as such that 1.5TB of whisper data became 63GB of victoria data

Is most of your whisper files sparse? Don't know how victoria works, but I remember they use the same compression algorithm with some tweaks. Would take a closer look on it.

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic

Just in case, there is a progress.db csv file saved in -home dir that have old and new file sizes. You can use that to figure out how many and what files actually grow bigger.

The meaning of each column:

path, oldSize, newSize, took (ns), started_at (unix timestamp)
bom-d-van commented 4 years ago

@iain-buclaw-sociomantic I have pushed a few more fixes and tweaks to make cwhisper cope well with your retention policy. Now the results looks a bit better. You need to

$ du -sh /Users/bom_d_van/Downloads/value/value.wsp*
268K    /Users/bom_d_van/Downloads/value/value.wsp
212K    /Users/bom_d_van/Downloads/value/value.wsp.cwsp

$ du -sh /Users/bom_d_van/Downloads/31010/31010.wsp*
268K    /Users/bom_d_van/Downloads/31010/31010.wsp
 60K    /Users/bom_d_van/Downloads/31010/31010.wsp.cwsp

To summarize, in cwhisper, like whisper, it's still a round robin database, the difference is that cwhisper rounds robin on blocks, and each block with a fixed 7200 data points (which is recommended by the facebook paper). In every archive, there is only N+1 blocks (N blocks for containing all the data points specified by retention policy, and one extra as rotation buffer).

In our own production environment, our archives tend to have much more than 7200 data points even in lowest archive. So the one extra blocks isn't much of an overhead.

However, for your retention policy: 30s:3d,5m:30d,1h:90d,4h:1y,1d:3y, even in the highest archives, there are only 8640 data points. Thus 7200 data points as a minimum block size is too much overhead, that's why you are seeing file size overgrown.

Also, I have added a new command cmd/estimate.go so you could also pre-check if there is more retention policy that cwhisper needs to take care of:

$ go run cmd/estimate.go 30s:3d,5m:30d,1h:90d,4h:1y,1d:3y
compressed whisper file sizes using different size per data points and comparing to standard whisper file
standard whisper file size = 266.38KB (272776)
point size, file size, ratio
0.2, 7.53KB (7706), 2.83%
2.0, 57.61KB (58996), 21.63%
7.0, 196.77KB (201496), 73.87%
14.0, 391.60KB (400996), 147.01%

Hope this could help you!

bom-d-van commented 4 years ago

but the compaction was as such that 1.5TB of whisper data became 63GB of victoria data

Victoriametrics is not using round robin database, thus there is no disk pre-allocation. Therefore, if metrics are sparse like your first example that contains only a few data points, this would be a big improvement. However, if the metrics are dense like cpu or memory which always have data generated, the difference might not be that extreme.

What's more Victoriametrics doesn't have downsampling, so I don't know did you also manage to migrate the downsampled data in whisper into victoria? And in whisper, for your retention policy 30s:3d,5m:30d,1h:90d,4h:1y,1d:3y, downsampled archives takes 61.98% of the space even in standard whisper files, so it's probably not a fair comparison.

Hope this explanation makes sense to you! :)

iain-buclaw-sociomantic commented 4 years ago

Is most of your whisper files sparse?

All performance metrics are being constantly recorded - host, services, micro-services. I think there's only one micro-service that is ran as a cron-job, ergo sparse.

Business metrics take up the most space, and they have a tendency to be short-lived, i.e: collecting data for 3 weeks, then that's it. However that is not a given, and can could be turned off/on at any time. Removing 1d:3y archive in these metrics may infact be the best thing to do, but I'll leave as is for now.

What's more Victoriametrics doesn't have downsampling, so I don't know did you also manage to migrate the downsampled data in whisper into victoria?

All you're sending is a value with a timestamp, and that is what gets stored. You are right, there is no downsampling, but the result is that all historical data is itself shown to be downsampled when you query it, even though all new data will have max precision (this is fine, as the longer time go on, the less relevant they'll become anyway, and will just get pruned when they overrun the global fixed retention period. :-)

Hope this could help you!

Thanks! I'll rebuild and give it another try. Perhaps extra one feature I can think of is to have a flag which tells the convert program to check file sizes before overwriting the original. And if the resultant file grew, then skip it.

It isn't really a necessity to have all files compressed, as go-carbon seamlessly handles both formats without complaint.

azhiltsov commented 4 years ago

the result is that all historical data is itself shown to be downsampled when you query it This require enormous amount of memory during rendering, or either sampling or preprocessing on the storage side. What do you use for render requests to VictoriaMetrics?

iain-buclaw-sociomantic commented 4 years ago

@azhiltsov - I'll reclarify, the data imported is the downsampled whisper data, even though all newly written stats will keep the highest precision (never downsampled). I have no idea what would happen if there's 5 years worth of data at 30 second sampling, and someone decides to query it all at once. I would hope that every datapoint does not get sent over the wire.

Civil commented 4 years ago

To my knowledge, for VictoriaMetrics it depends on how exactly you are query it. At least while I was implementing backend plugin for it in carbonapi (the one that I maintain), it had some way to control the precision of output data. How it will actually read the values from disk - that's the good question and I think it's better to ask it's authors (they reply quiet fast on both Telegram and github issues in my experience).

bom-d-van commented 4 years ago

Thanks! I'll rebuild and give it another try. Perhaps extra one feature I can think of is to have a flag which tells the convert program to check file sizes before overwriting the original. And if the resultant file grew, then skip it.

@iain-buclaw-sociomantic Thanks for the suggestion, would consider adding a flag for that.

Small tip:

For testing purpose, you could start with randomly grabbing some metrics from your server, copied it somewhere, and run the convert program to see if it's working as expected. So if the result is still bad, then you don't have to do any cleanup or reversion. Random is important so you could have a better samples.

As the efficiency of the gorilla algorithms is built upon the assumption that data points are not random, so the compressed result is good overall but some metric files might end up bigger (like what cmd/estimate.go said).

shuf is my good friend on linux.

find /path/to/whisper/files -type f > files.txt
#psuedocode
shuf -n 1000 files.txt | xargs copy the files to somewhere else
iain-buclaw-sociomantic commented 4 years ago

@bom-d-van - the first few random samples seem positive.

# find orig/ -type f -size -267k | wc -l
183
# find orig/ -type f -size +267k | wc -l
3
# find convert-old/ -type f -size -267k | wc -l
447
# find convert-old/ -type f -size +267k | wc -l
553
# find convert-new/ -type f -size -267k | wc -l
997
# find convert-new/ -type f -size +267k | wc -l
3
iain-buclaw-sociomantic commented 4 years ago

And just ran on a larger random sample of 1% of a server storage.

# find orig/ -type f -size -267k | wc -l
5537
# find orig/ -type f -size +267k | wc -l
3
# find convert-old/ -type f -size -267k | wc -l
14267
# find convert-old/ -type f -size +267k | wc -l
18126
# find convert-new/ -type f -size -267k | wc -l
32395
# find convert-new/ -type f -size +267k | wc -l
3

Still, so far so good.

1.9G    convert-new
12G     convert-old
7.7G    orig
bom-d-van commented 4 years ago

That's Lovely! :D

Also you could run go run cmd/verify.go file.wsp file.cwsp to verify the migration result. This is helpful to verify if data being correctly migrated to cwhisper files.

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van - Random question, but if the convert program thread-safe and/or properly clearing any thread-local data before processing the next metric?

I've hit a strange behaviour that can't be reproduced in isolation.

267K ./orig/9988.wsp
1.3M ./prod/9988.wsp  # convert -store ./whisper/
156K ./test/9988.wsp  # convert -store ./9988.wsp

I'm not sure how else it would do one thing when converting all metrics and another when it's just a small selection of metrics.

bom-d-van commented 4 years ago

That's strange. It should be thread-safe.

The process goes like this: the original whisper file is flocked, then cmd/convert created a temporary file for compressed data, and at the end of the conversion the whisper file will be resized and overwritten.

https://github.com/go-graphite/go-whisper/blob/master/cmd/convert.go#L220

Was ./prod/9988.wsp converted to cwhisper before? Because cmd/convert.go won't do anything if the file is already compressed once.

azhiltsov commented 4 years ago

I assume go-carbon is running on the same host while you converting. Please check that you running latest version of it with flock enabled.

iain-buclaw-sociomantic commented 4 years ago

@azhiltsov built from 49aae21f754e0f281eda9a807fc497f15112c179 and flock is turned on.

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic do you mind sharing the three files? I could take a closer look.

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van this is one example: missing.tar.gz

And yes, all files are the originals, there's no previously compressed files.

I've just ran it again and it looks fine...

iain-buclaw-sociomantic commented 4 years ago

Oh wait! Maybe it's go-carbon that's resizing the file to be bigger. :-o

iain-buclaw-sociomantic commented 4 years ago

missing.tar.gz

@bom-d-van - Confirmed, it isn't you, it's go-carbon. Managed to make a copy of the file before the next write to it (converted file at 15:54, go-carbon updated file at 15:57).

bom-d-van commented 4 years ago

Confirmed, it isn't you, it's go-carbon. Managed to make a copy of the file before the next write to it (converted file at 15:54, go-carbon updated file at 15:57).

That's bad. Sorry, most likely also an issue in go-whisper library. Will figure out what's going on and get back to you. Thanks for all your feedback! :)

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van - To start with, let's update whisper in go-carbon. ;-)

bom-d-van commented 4 years ago

Some extra info, @azhiltsov tested cwhisper on our environment by backfilling a new cluster with buckytools, so we actually don't have to deal these type of problems.

However, the benefit of using cmd/convert.go is much better compressed result because it would pre-calculate a more accurate size. And through rebalancing, the minimum data point size is hard-coded to 2. It is intended to avoid too many cwhisper file extensions if the default data point size is too small for the payload.

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van #316 - I'll rebuild go-carbon and give it another try tomorrow.

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic I think I might have identified the issue. The cause is two folds:

  1. you are right, go-carbon needs to be updated to use the new version of go-whisper library
  2. the point size used/generated by cmd/convert.go is too optimistic, not enough to hold all the data points to honor retention policy, so when go-carbon writes to the cwhisper file again, it detects the issue, extends the file, using a wrong version of the go-whisper library. cmd/convert.go should do a better job to avoid extension to happen right after the conversion, by using a correct point size, for efficiency.

so I will need to patch go-whisper again to make a more realistic calculation. Will push my patch soon!

bom-d-van commented 4 years ago

@iain-buclaw-sociomantic Just pushed my changes here, hopefully this is the last problem in your migration test! https://github.com/go-graphite/go-whisper/pull/9/files

Let's update buckytools and go-carbon after we identified no new problems!

And really thanks for all your feedback and patience, all of these make cwhisper works better!

iain-buclaw-sociomantic commented 4 years ago

@bom-d-van - OK, I'll check this both with and without the 0.618 patch. Should be simple actually, as you added a write command. :-)

iain-buclaw-sociomantic commented 4 years ago
$ ../write-24b937f6 missing-24b937f6.wsp 1572882000:407
file is extended.
$ ../write-dd1d9845 missing-dd1d9845.wsp 1572882000:407
file is extended.
$ ../write-ee616e4b missing-ee616e4b.wsp 1572882000:407
file is extended.
$ du -h *.wsp
420 missing-24b937f6.wsp
28  missing-dd1d9845.wsp
28  missing-ee616e4b.wsp
$ ../dump missing-dd1d9845.wsp > dd1d9845.dump
$ ../dump missing-ee616e4b.wsp > ee616e4b.dump
$ diff -pur dd1d9845.dump ee616e4b.dump 
$ 

I'm not seeing any immediate difference that go-graphite/go-whisper#9 is adding, however this is just a naive test.

bom-d-van commented 4 years ago

There is a buffer in cwhisper archives, and extension is only triggered if there is full data blocks in the archives. So to test, it's better to write many more data points to the file, so that aggregation/downsampling and extension could be triggered.

An example:

for i in {0..7200}; do i=$((1572941377+$i*600)); print $i:3,; done | go run cmd/write.go missing.wsp
# or
for i in {0..7200}; do i=$((1572941377+$i*600)); print $i:$RANDOM,; done | go run cmd/write.go missing.wsp
iain-buclaw-sociomantic commented 4 years ago

For this particular file, I think we're fine without. I'll proceed with just pushing current go-carbon binaries out and converting this one server again.

I don't see anything wrong with the change however.

iain-buclaw-sociomantic commented 4 years ago

image

Nowhere near halfway done yet, but this is the current result. :+1:

iain-buclaw-sociomantic commented 4 years ago

Still got 100GB of uncompress data to go, however to give an update to the above, I'm now seeing disk usage down to 250GB. That is a good 3x reduction.

So what's left here? I think just resync the whisper library with both go-carbon and buckytools, given that the other whisper PR has been merged.

bom-d-van commented 4 years ago

Still got 100GB of uncompress data to go, however to give an update to the above, I'm now seeing disk usage down to 250GB. That is a good 3x reduction.

Nice! :D

So what's left here? I think just resync the whisper library with both go-carbon and buckytools, given that the other whisper PR has been merged.

Yes, I agree.

bom-d-van commented 4 years ago

Not sure what's your plan after this test, but just in case, I would recommend you keeping one replica of your go-carbon instances on standard whisper format, at least for some weeks or even months, until you feel things are stable and it meets your expectation.

Because most likely there is still some unknown bugs or unconsidered scenario; keeping a copy around for sometime, so when bad things happen, you can still recover the data from the other replica.

Storage optimization is hard, unlike pure networking applications, bugs could have persistent effect.

iain-buclaw-sociomantic commented 4 years ago

Not sure what's your plan after this test, but just in case, I would recommend you keeping one replica of your go-carbon instances on standard whisper format, at least for some weeks or even months, until you feel things are stable and it meets your expectation.

Seems reasonable, and we do have the capacity in the replica cluster to do that for the moment (unfortunately, the first cluster seems to have something more like a 40/60 distribution of metrics, so one node has been constantly hitting 90% over the last weeks).

Yes, I agree.

Both pulls have been raised.

iain-buclaw-sociomantic commented 4 years ago

Both pulls have been raised.

Both have been merged.