ergochat / ergo

A modern IRC server (daemon/ircd) written in Go.
https://ergo.chat/
MIT License
2.2k stars 175 forks source link

testnet is deadlocked #2149

Closed slingamn closed 2 months ago

slingamn commented 2 months ago

It's running v2.13.0 and it's deadlocked since yesterday-ish:

https://gist.github.com/slingamn/f1b08b2d2150db41263da0c0952cd4d5

slingamn commented 2 months ago

Wedged channel mutex (but who is holding it?):

goroutine 5701510 [sync.RWMutex.Lock, 544 minutes]:
sync.runtime_SemacquireRWMutex(0x6af923?, 0x10?, 0x0?)
    /usr/local/go/src/runtime/sema.go:87 +0x25
sync.(*RWMutex).Lock(0x1?)
    /usr/local/go/src/sync/rwmutex.go:152 +0x6a
github.com/ergochat/ergo/irc.(*Channel).Join.func1.1(0xc000292200, 0xc00039c700, 0xc0005dd240, 0x0)
    /home/ergo/src/ergo/irc/channel.go:800 +0x47
github.com/ergochat/ergo/irc.(*Channel).Join.func1(0xc000292200, 0x14?, 0x932d70?)
    /home/ergo/src/ergo/irc/channel.go:814 +0xad
github.com/ergochat/ergo/irc.(*Channel).Join(0xc000292200, 0xc00039c700, {0x0, 0x0}, 0x0, 0xc0000aec80)
    /home/ergo/src/ergo/irc/channel.go:819 +0x746
github.com/ergochat/ergo/irc.(*ChannelManager).Join(0xc0005ec825?, 0xc00039c700, {0xc0005ec825, 0x5}, {0x0, 0x0}, 0x0, 0xc0005dd8f0?)
    /home/ergo/src/ergo/irc/channelmanager.go:148 +0x196
github.com/ergochat/ergo/irc.joinHandler(0xc0003c5680, 0x0?, {{0x0, 0x0}, {0xc0005ec820, 0x4}, {0xc0007a5240, 0x1, 0x1}, 0x0, ...}, ...)
    /home/ergo/src/ergo/irc/handlers.go:1294 +0x2d1
github.com/ergochat/ergo/irc.(*Command).Run.func1(0xc0000aec80, 0xc00039c700, 0xd0eac0?, 0xc0003c5680, {{0x0, 0x0}, {0xc0005ec820, 0x4}, {0xc0007a5240, 0x1, ...}, ...}, ...)
    /home/ergo/src/ergo/irc/commands.go:47 +0x267
github.com/ergochat/ergo/irc.(*Command).Run(0xc0005ec820?, 0xc000434be0?, 0xc00039c700, 0xc000416dc0, {{0x0, 0x0}, {0xc0005ec820, 0x4}, {0xc0007a5240, 0x1, ...}, ...})
    /home/ergo/src/ergo/irc/commands.go:48 +0x158
github.com/ergochat/ergo/irc.(*Client).run(0xc00039c700, 0xc000416dc0)
    /home/ergo/src/ergo/irc/client.go:715 +0x6e8
github.com/ergochat/ergo/irc.(*Server).RunClient(0xc0003c5680, {0xa06da8, 0xc000566ff0})
    /home/ergo/src/ergo/irc/client.go:389 +0xc1a
created by github.com/ergochat/ergo/irc.(*WSListener).handle in goroutine 5701508
    /home/ergo/src/ergo/irc/listeners.go:190 +0x3e5

Destroy semaphore is wedged here:

goroutine 5705290 [chan send, 418 minutes]:
github.com/ergochat/ergo/irc/utils.Semaphore.Acquire(...)
    /home/ergo/src/ergo/irc/utils/semaphores.go:22
github.com/ergochat/ergo/irc.(*Client).destroy(0xc00048b180, 0xc00049c840)
    /home/ergo/src/ergo/irc/client.go:1299 +0xc38
github.com/ergochat/ergo/irc.(*Client).run.func1()
    /home/ergo/src/ergo/irc/client.go:623 +0x185
github.com/ergochat/ergo/irc.(*Client).run(0xc00048b180, 0xc00049c840)
    /home/ergo/src/ergo/irc/client.go:724 +0x9e2
github.com/ergochat/ergo/irc.(*Server).RunClient(0xc0003c5680, {0xa06de8, 0xc00050a2a0})
    /home/ergo/src/ergo/irc/client.go:389 +0xc1a
created by github.com/ergochat/ergo/irc.(*NetListener).serve in goroutine 13
    /home/ergo/src/ergo/irc/listeners.go:99 +0x2e5

Same channel mutex as in the first trace?

goroutine 5701600 [sync.RWMutex.RLock, 542 minutes]:
sync.runtime_SemacquireRWMutexR(0x7f81a5?, 0x60?, 0xc0004596d0?)
    /usr/local/go/src/runtime/sema.go:82 +0x25
sync.(*RWMutex).RLock(...)
    /usr/local/go/src/sync/rwmutex.go:71
github.com/ergochat/ergo/irc.(*Channel).auditoriumFriends(0xc000292200, 0xc00039c700)
    /home/ergo/src/ergo/irc/channel.go:1603 +0x6e
github.com/ergochat/ergo/irc.(*Client).destroy(0xc00039c700, 0xc000416dc0)
    /home/ergo/src/ergo/irc/client.go:1316 +0xe6a
github.com/ergochat/ergo/irc.(*Session).handleIdleTimeout(0xc000416dc0)
    /home/ergo/src/ergo/irc/client.go:828 +0x25d
created by time.goFunc
    /usr/local/go/src/time/sleep.go:176 +0x2d
slingamn commented 2 months ago

I'm going to embargo the actual panic trace to make it slightly harder to reverse-engineer the DoS from the patch.

slingamn commented 2 months ago

This was introduced by #2058 (eeec481b8d8c4be36994750483e3c56cd075d04a) so it affects v2.12.0 and v2.13.0 (v2.13.1 is the fix). It's similar to #2063.