MerlinDMC / fluent-plugin-input-gelf

A GELF input for Fluentd
MIT License
8 stars 15 forks source link

Fix corruption of decompressed chunks by allowing larger UDP #16

Closed wdoekes closed 2 years ago

wdoekes commented 2 years ago

The graylog GELF docs have this to say about chunking/UDP sizes:

UDP datagrams are limited to a size of 65536 bytes. Some Graylog components are limited to processingup to 8192 bytes. A lot of compressed information fits in there but you sometimes might just havemore information to send. This is why Graylog supports chunked GELF.

This would tempt you to think that we need to accept at most 8192 bytes. However, in the wild we have seen reassembled UDP packets with size 8204 (8192 + 12 byte chunk headers).

The truncation at 8192 bytes caused severe corruption of decompressed chunks: instead of a gzip decompress error, we were confronted with invalid data (*).

In the best case fluentd would refuse the message (because of a missing tag match). In the worst case, fluentd passed the messages along, causing issues along the road (like invalid utf8 sequences, missing header fields, corrupt timestamps).

(*) The logs would say:

2022-04-12 09:56:33 +0000 [warn]: #0 pattern not match: "{\"host\":...}"

This means that Gelfd2::Parser.parse(data) did not raise any errors, even though we're fairly confident there should've been a CRC error somethere. Surely Zlib::GzipReader.new(StringIO.new(data)) should have choked on messages that lack 12 bytes at various places...?

I see that it unfortunately doesn't:

$ echo 'Your argument is sound, nothing but sound.' |
  python3 -c 'import gzip,sys;b=gzip.compress(sys.stdin.buffer.read());sys.stdout.buffer.write(b[0:20]+b[32:])' \
  > broken.compressed

$ ruby -e 'require "zlib";print(Zlib::GzipReader.new(STDIN).read)' \
  < broken.compressed
Your argul, nothing but argul.

$ python3 -c 'import gzip,sys;print(gzip.decompress(sys.stdin.buffer.read()))' \
  < broken.compressed
...
  File "/usr/lib/python3.8/gzip.py", line 516, in _read_eof
    raise BadGzipFile("CRC check failed %s != %s" % (hex(crc32),
gzip.BadGzipFile: CRC check failed 0x87597587 != 0x3f2df618

So the underlying difficulty of finding the cause of this bug should be attributed to php-style (=sloppy) Gzip decompression in Ruby.

wdoekes commented 2 years ago

Apparently, the GzipReader needs to be closed to trip the error:

$ echo 'Your argument is sound, nothing but sound.' |
  python3 -c 'import gzip,sys;b=gzip.compress(sys.stdin.buffer.read());sys.stdout.buffer.write(b[0:20]+b[32:])' |
  ruby -e 'require "zlib";r=Zlib::GzipReader.new(STDIN);print(r.read)'
Your argul, nothing but argul.

vs.

$ echo 'Your argument is sound, nothing but sound.' |
  python3 -c 'import gzip,sys;b=gzip.compress(sys.stdin.buffer.read());sys.stdout.buffer.write(b[0:20]+b[32:])' |
  ruby -e 'require "zlib";r=Zlib::GzipReader.new(STDIN);print(r.read);r.close'
Your argul, nothing but argul.
Traceback (most recent call last):
    1: from -e:1:in `<main>'
-e:1:in `close': invalid compressed data -- crc error (Zlib::GzipFile::CRCError)

I'm off filing another ticket at gelfd2 then..

MerlinDMC commented 2 years ago

Will it be enough to raise the socket buffer or do you want to bump the gelfd2 dependency as well?

wdoekes commented 2 years ago

Raising the buffer is sufficient to fix the problem. So dependency bump is not needed IMO.