golang / snappy

The Snappy compression format in the Go programming language.
BSD 3-Clause "New" or "Revised" License
1.53k stars 165 forks source link

How to debug snappy: corrupt input? #73

Open xiaoxiaoHe-E opened 1 year ago

xiaoxiaoHe-E commented 1 year ago

Hi team, we met an error snappy: corrupt input while using snappy to compress through a TCP connection.

How we build the connection:

On the source side:

conn.reader = snappy.NewReader(io.Reader(net.Conn))
conn.writeCloser = snappy.NewBufferedWriter(io.WriteCloser(net.Conn))
conn.Write(buf)

On the destination side:

conn.reader = snappy.NewReader(io.Reader(net.Conn))
conn.writeCloser = snappy.NewBufferedWriter(io.WriteCloser(net.Conn))
conn.Read(buf)

We get the error snappy: corrupt input when conn.Read(buf). And this error happens intermittently.

Is this caused by network problem?

We read some snappy code, and we know that this error is reported because the checksum or the decode result length is wrong. But we use snappy based on a TCP connection and TCP can guarantee data integrity. So if snappy needs several network packages to decode the complete data? Or this is caused by some problem with the network card, the hardware cannot verify the data correctly.

Is this caused by memory problem?

I also find some discussions on the network that suspect this is caused by memory overflow or runtime memory limit not enough. But I cannot make sure. Since we don't get any other error messages.

I'd appreciate any help or suggestions on how to debug. Thanks!

xiaoxiaoHe-E commented 1 year ago

I think we figure out the reason now:

How to reproduce the issue:

  1. tcpkill the according connection
  2. use tc to make network packages corrupt
    sudo tc qdisc add dev ens192 root netem corrupt 30%

    So the snappy can always report the error snappy: corrupt input in our environment.

Misleading error message corrupt input

We get this error from here. But the real reason is that snappy read from a closed connection and get an empty data next time. So cannot decode incomplete data correctly. We think it's better to report an ununexpected EOF error here.

Here is the code we modified during testing:

diff --git a/decode.go b/decode.go
index 23c6e26..208f054 100644
--- a/decode.go
+++ b/decode.go
@@ -13,6 +13,7 @@ import (
 var (
        // ErrCorrupt reports that the input is invalid.
        ErrCorrupt = errors.New("snappy: corrupt input")
+       ErrEof = errors.New("snappy: unexpected EOF")
        // ErrTooLarge reports that the uncompressed length is too large.
        ErrTooLarge = errors.New("snappy: decoded block is too large")
        // ErrUnsupported reports that the input isn't supported.
@@ -111,7 +112,7 @@ func (r *Reader) Reset(reader io.Reader) {
 func (r *Reader) readFull(p []byte, allowEOF bool) (ok bool) {
        if _, r.err = io.ReadFull(r.r, p); r.err != nil {
                if r.err == io.ErrUnexpectedEOF || (r.err == io.EOF && !allowEOF) {
-                       r.err = ErrCorrupt
+                       r.err = ErrEof
                }
                return false
        }
(END)
codenoid commented 6 months ago

I got this error because the net.Conn being read multiple times concurrently