DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.87k stars 1.21k forks source link

Re-enable Zstandard #4208

Open CAFxX opened 5 years ago

CAFxX commented 5 years ago

A few years ago (https://github.com/DataDog/datadog-agent/pull/450) support for zstd was disabled by default, citing incompatibilities with the datadog ingest side.

We are currently facing pretty high egress costs as datadog has no ingest PoP in our GCP region (asia-northeast-1), and we would be pretty happy to trade some CPU cycles for lower egress costs.

Would it be possible to re-enable zstd support and, ideally, making the compression level configurable?

ogaca-dd commented 5 years ago

Hi @CAFxX, thanks for reaching out,

We’ll take the proposal into consideration and update this issue!

ogaca-dd commented 4 years ago

Hi @CAFxX,

Our backend is not yet ready to support zstd. In addition, based on https://github.com/facebook/zstd#benchmarks the difference between zlib and zstd is about 15%.

Can you contact our support team to help you reducing your egress cost?

CAFxX commented 4 years ago

In addition, based on https://github.com/facebook/zstd#benchmarks the difference between zlib and zstd is about 15%.

At the same compression speed yes, but that's why I explicitly asked to be able to configure the compressor level as we are willing to trade CPU time for higher compression ratios and lower egress cost.

The page you are quoting from shows significantly more than 15% improvements under these conditions.

Can you contact our support team to help you reducing your egress cost?

Already did, so far no other option has been offered.

CAFxX commented 4 years ago

Just as a quick benchmark based on real data, I downloaded a few MBs of our logs from the datadog UI (from non-overlapping time ranges) and tried to compress it with different levels using zstd and gzip. Results:

   8849920 extract.tar
   1471278 extract.tar.1.gz
   1126425 extract.tar.def.gz
   1089061 extract.tar.9.gz
    895874 extract.tar.1.zst
    810474 extract.tar.def.zst
    679392 extract.tar.11.zst
    584194 extract.tar.19.zst

Going from gzip 9 to zstd 11 compressed size savings are ~38%. zstd 19 is too slow, but 11 is not significantly slower than gzip 9, so we could probably even raise it further.

l1x commented 4 years ago

@CAFxX Did you use training enabled compression?

zstd --train FullPathToTrainingSet/* -o dictionaryName
zstd -D dictionaryName FILE
zstd -D dictionaryName --decompress FILE.zst
CAFxX commented 4 years ago

I did not use custom dictionaries; it was a very simple 2-minutes test just to measure if it would be useful.

(also I am not sure how using custom dictionaries would work for this use-case, as if clients were to use custom dictionaries, the dictionaries would need to be shared and kept in sync with the datadog ingest servers - otherwise decompression on the ingest side would not work)

l1x commented 4 years ago

I did not use custom dictionaries; it was a very simple 2-minutes test just to measure if it would be useful.

(also I am not sure how using custom dictionaries would work for this use-case, as if clients were to use custom dictionaries, the dictionaries would need to be shared and kept in sync with the datadog ingest servers - otherwise decompression on the ingest side would not work)

I was just wondering if it was this much better without the dictionary. Not sure how should this work in your case. I would probably get a day worth of data and train it once and share that with all the nodes involved. Maybe re-do this once a month. Not sure how much saving you could achieve.

ogaca-dd commented 4 years ago

Hi @CAFxX,

I am trying to better understand your setup.

Are you sending logs, metrics or both? If you are sending logs, did you try using HTTP which provides gzip compression unlike TCP? If you switched from TCP to HTTP did you notice a decrease in the egress costs?