Closed wdoekes closed 2 years ago
Apparently, the GzipReader needs to be closed to trip the error:
$ echo 'Your argument is sound, nothing but sound.' |
python3 -c 'import gzip,sys;b=gzip.compress(sys.stdin.buffer.read());sys.stdout.buffer.write(b[0:20]+b[32:])' |
ruby -e 'require "zlib";r=Zlib::GzipReader.new(STDIN);print(r.read)'
Your argul, nothing but argul.
vs.
$ echo 'Your argument is sound, nothing but sound.' |
python3 -c 'import gzip,sys;b=gzip.compress(sys.stdin.buffer.read());sys.stdout.buffer.write(b[0:20]+b[32:])' |
ruby -e 'require "zlib";r=Zlib::GzipReader.new(STDIN);print(r.read);r.close'
Your argul, nothing but argul.
Traceback (most recent call last):
1: from -e:1:in `<main>'
-e:1:in `close': invalid compressed data -- crc error (Zlib::GzipFile::CRCError)
I'm off filing another ticket at gelfd2 then..
Will it be enough to raise the socket buffer or do you want to bump the gelfd2 dependency as well?
Raising the buffer is sufficient to fix the problem. So dependency bump is not needed IMO.
The graylog GELF docs have this to say about chunking/UDP sizes:
This would tempt you to think that we need to accept at most 8192 bytes. However, in the wild we have seen reassembled UDP packets with size 8204 (8192 + 12 byte chunk headers).
The truncation at 8192 bytes caused severe corruption of decompressed chunks: instead of a gzip decompress error, we were confronted with invalid data (*).
In the best case fluentd would refuse the message (because of a missing tag match). In the worst case, fluentd passed the messages along, causing issues along the road (like invalid utf8 sequences, missing header fields, corrupt timestamps).
(*) The logs would say:
2022-04-12 09:56:33 +0000 [warn]: #0 pattern not match: "{\"host\":...}"
This means that Gelfd2::Parser.parse(data) did not raise any errors, even though we're fairly confident there should've been a CRC error somethere. Surely Zlib::GzipReader.new(StringIO.new(data)) should have choked on messages that lack 12 bytes at various places...?
I see that it unfortunately doesn't:
So the underlying difficulty of finding the cause of this bug should be attributed to php-style (=sloppy) Gzip decompression in Ruby.