dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.47k stars 926 forks source link

Response Race On Subscription #924

Closed pascaldekloe closed 1 year ago

pascaldekloe commented 1 year ago

Responses from DragonflyDB corrupt sometimes when subscribing to a channel that encounters a publish at the same time.

The following line dump shows how a subscription count is lost or overwritten by the message (array) that follows.

lost-subs-count.txt

Specifically, the response was:

*3
$9
subscribe
$28
channel-18126908833894131297
*3
$7
message
…

… while it should have been:

*3
$9
subscribe
$28
channel-18126908833894131297
:1
*3
$7
message
…

…, with ":1" included.

This is not the only scenario. I just got lucky while recording. Sometimes a redundant "3" appears before another message array, as in "3\r\n*3\r\n$7message…". Followup on #316.

pascaldekloe commented 1 year ago

Can reproduce with the following patch to github.com/pascaldekloe/redis.

diff --git a/client.go b/client.go
index 6207e01..ee5f694 100644
--- a/client.go
+++ b/client.go
@@ -2,6 +2,8 @@ package redis

 import (
        "bufio"
+"os"
+"io"
        "errors"
        "fmt"
        "net"
@@ -470,7 +472,12 @@ func (c *ClientConfig) connect(readBufferSize int) (net.Conn, *bufio.Reader, err
                tcp.SetNoDelay(false)
                tcp.SetLinger(0)
        }
-       reader := bufio.NewReaderSize(conn, readBufferSize)
+
+f, err := os.OpenFile(fmt.Sprintf("linedump-%x", &network), os.O_WRONLY|os.O_CREATE|os.O_EXCL|os.O_APPEND|os.O_SYNC, 0o600)
+if err != nil {
+       panic(err)
+}
+       reader := bufio.NewReaderSize(io.TeeReader(conn, f), readBufferSize)

        // apply sticky settings
        if c.Password != nil {

… and then repeat go test -run 'Subscribe$' && rm linedump-* until failure. Needs about a dozen tries.

pascaldekloe commented 1 year ago

The redundant array length just happend again on v0.17.0: redis: protocol violation; received "*3\r\n*3\r\n$7\r\nmess"

romange commented 1 year ago

Hi @pascaldekloe - I found the culprit, working on the fix. Thanks!