centrifugal / centrifuge

Real-time messaging library for Go. The simplest way to add feature-rich and scalable WebSocket support to your application. The core of Centrifugo server.
https://pkg.go.dev/github.com/centrifugal/centrifuge
MIT License
1.06k stars 91 forks source link

Race between using a redis connection and releasing it #399

Closed errcheckenjoyer closed 1 month ago

errcheckenjoyer commented 1 month ago

Describe the bug It looks like a race between using a redis connection and releasing it.

To Reproduce It is difficult to reproduce because you need a redis connection error under a heavy load.

We have experienced crashes of our centrifugo instances (but very few of them) caused by

panic: DedicatedClient should not be used after recycled

goroutine 13345831 [running]:
github.com/redis/rueidis.(*dedicatedClusterClient).acquire(0xc00471af00, {0x2529018?, 0x350bda0?}, 0x4000?)
    /root/go/pkg/mod/github.com/redis/rueidis@v1.0.37/cluster.go:1110 +0x3af
github.com/redis/rueidis.(*dedicatedClusterClient).Do(0xc00471af00, {0x2529018, 0x350bda0}, {0xc0058511c0, 0x3000, 0x4000})
    /root/go/pkg/mod/github.com/redis/rueidis@v1.0.37/cluster.go:1163 +0xaa
github.com/centrifugal/centrifuge.(*RedisBroker).subscribe(0xc00046a700, 0xc0003fd680, {0xc006341900, 0x49})
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/broker_redis.go:862 +0x5b6
github.com/centrifugal/centrifuge.(*RedisBroker).Subscribe(0xc00046a700?, {0xc006341900?, 0x49?})
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/broker_redis.go:837 +0x85
github.com/centrifugal/centrifuge.(*Node).addSubscription(0xc000443800, {0xc006341900, 0x49}, {0xc00ab08a00?, {0x0?, 0x7f2e0f8bc300?}})
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/node.go:999 +0x330
github.com/centrifugal/centrifuge.(*Client).subscribeCmd(_, _, {{0x0, {0x0, 0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...)
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/client.go:2775 +0x4ed
github.com/centrifugal/centrifuge.(*Client).connectCmd.func1({0xc006341900, 0x49}, {0x0, {0x0, 0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, ...})
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/client.go:2382 +0x20f
created by github.com/centrifugal/centrifuge.(*Client).connectCmd in goroutine 13345828
    /root/go/pkg/mod/github.com/centrifugal/centrifuge@v0.32.3-0.20240527045602-629d719708d9/client.go:2372 +0x1474

that followed problems with the redis connection (we found this in logs right before the panic):

"error":"context deadline exceeded" "msg":"pub/sub error"
"error":"context deadline exceeded" "msg":"error adding subscription"

Expected behavior There should be no panic or service crash.

Versions (please complete the following information):

FZambia commented 1 month ago

Hello @errcheckenjoyer

I recently noticed this myself and opened an issue in rueidis - https://github.com/redis/rueidis/issues/586

Need to find some time and contribute

errcheckenjoyer commented 1 month ago

Looking forward to it. Thanks for the quick reply!

FZambia commented 1 month ago

@errcheckenjoyer hello, Centrifuge master branch now pins to rueidis version without such panic.

errcheckenjoyer commented 1 month ago

Thanks @FZambia, will try this update during the week.

Can you also help me to understand what happens to messages sent into redis during such disconnect-reconnect process? It seems to me that they won't be delivered without additional actions from client (reconnect with recovery or history api call), even if they are stored in history stream. And it is impossible to notice this on client side.

FZambia commented 1 month ago

Yes, messages will be lost, but Centrifuge has message recovery feature which allows to deal with it if necessary – read more here:

When recovery enabled, to notice on the client-side you have wasRecovering and recovered flags.

v0.33.0 uses rueidis without panic, so I think good to close.