ergochat / ergo

A modern IRC server (daemon/ircd) written in Go.
https://ergo.chat/
MIT License
2.21k stars 176 forks source link

Always on status causing panic #2113

Closed Sheikah45 closed 5 months ago

Sheikah45 commented 6 months ago

We are using ergochat as the chat server for our game client. We have recently been experimenting with using the always-on functionality to enable offline messages for our player base.

After enabling always on we have noticed that the chat server occasionally dies from an invalid memory address when trying to update the status of one of the users. This was occurring when there were approximately 1k users both offline and online.

I tried to search the documentation to see if there were any caveats related to the always-on status but was not able to find anything. If you have any ideas or suggestions on how this possibly could be resolved that would be greatly appreciated.

We are using ergochat 2.12.0 running in docker.

ergochat_1  | panic: runtime error: invalid memory address or nil pointer dereference
ergochat_1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7b3cb0]
ergochat_1  | goroutine 151577592 [running]:
ergochat_1  | github.com/ergochat/ergo/irc.(*Channel).alwaysOnStatus(0xc0218a9500?, 0x7fc113dd8640?)
ergochat_1  |   /go/src/github.com/ergochat/ergo/irc/channel.go:553 +0xd0
ergochat_1  | github.com/ergochat/ergo/irc.(*Client).performWrite(0xc0218a9500, 0x0)
ergochat_1  |   /go/src/github.com/ergochat/ergo/irc/client.go:1806 +0x1af
ergochat_1  | github.com/ergochat/ergo/irc.(*Client).writeLoop(0xc0218a9500)
ergochat_1  |   /go/src/github.com/ergochat/ergo/irc/client.go:1777 +0x1e
ergochat_1  | created by github.com/ergochat/ergo/irc.(*Client).wakeWriter in goroutine 151614265
ergochat_1  |   /go/src/github.com/ergochat/ergo/irc/client.go:1771 +0x99
slingamn commented 6 months ago

Sorry about the delay. This seems like a critical issue. I'll work on it ASAP.

slingamn commented 6 months ago

The proximate cause of this appears to be (*Client).channels being out of sync with (*Channel).members; specifically, the client thinks it's a member of a channel, but the channel doesn't agree. I'm not sure about the root cause of the inconsistency, but it probably makes sense to add a simple nil guard here.

Sheikah45 commented 6 months ago

Thanks for getting back on this so quickly.

For a little more context we primarily saw this issue after a server restart where many clients were trying to connect all at once. That would probably help explain the states being out of sync.

We haven't seen this again yet since we made the irc clients we use less aggressive on the reconnect so it does seem fairly rare and only under load.

slingamn commented 6 months ago

I'm testing the fix for this now on my private server. Assuming everything goes well, I'll probably release the fix in 2.13.0-rc1 (i.e. a beta version of 2.13) in the next few days.

slingamn commented 6 months ago

@Sheikah45 the fix was released in v2.13.0-rc1:

https://github.com/ergochat/ergo/releases/tag/v2.13.0-rc1
https://github.com/ergochat/ergo/pkgs/container/ergo/164364176?tag=v2.13.0-rc1

Would you be able to try it out?

Sheikah45 commented 6 months ago

Yeah it is on my list to deploy and I can get back to you on what I observe over the week

Edit: It is now deployed and I will let you know of any issues.

Sheikah45 commented 6 months ago

We have been running it the last few days and haven't experienced any issues with upwards of 12000 always-on clients