bwmarrin / discordgo

(Golang) Go bindings for Discord
BSD 3-Clause "New" or "Revised" License
5.08k stars 808 forks source link

Deadlock Scenario On Reconnect #513

Open ErikMcClure opened 6 years ago

ErikMcClure commented 6 years ago

The Open() function calls OnEvent() twice after calling session.Lock(). Normally, this second call returns a Ready packet, which causes no problems. However, after a read error that triggers a reconnection, the discord API will NOT necessarily return a READY/RESUMED packet, because the API doesn't always know that a reconnection even happened, and will instead simply send whatever packet happened to be next. An example of the error messages gotten when this happens:

2018/02/28 06:03:52 [DG1] wsapi.go:226:listen() error reading from gateway wss://gateway.discord.gg/?v=6&encoding=json websocket, websocket: close 1006 (abnormal closure): unexpected EOF
2018/02/28 06:03:54 [DG1] wsapi.go:178:Open() Expected READY/RESUMED, instead got: &discordgo.Event{Operation:0, Sequence:362355, Type:"PRESENCE_UPDATE", RawData:json.RawMessage{ [cut]

The problem is that OnEvent assumes that the session write lock is NOT being held, because this is how it is normally called at wsapi:247. Consequently, two different packets will always deadlock if they happen to show up during the reconnect attempt:

  1. If a op 7 disconnect is, for whatever reason, sent immediately after a reconnect attempt, this will try to call Close() from inside of Open() and instantly deadlocks.
  2. If an op 11 heartbeat ack is sent, this calls s.Lock() and immediately deadlocks.

Even worse, should this simply be a normal event, this event is actually sent to event processing from inside of Open(), which could do anything or call anything in response to it, including Close(), which would also deadlock. Normally this isn't a problem, because the only event handler one expects is the OnReady event handler, but during a reconnect the Ready packet is often not sent and consequently any event handler could potentially be called.

There are two possible resolutions to this problem: Augment OnEvent() so that it can completely reject an unexpected event if a particular packet is being expected and ensure any code in the expected events doesn't do anything weird, or release the session lock before calling OnEvent() the second time, and make a custom handler for op 10 by breaking out the event preprocessing code from OnEvent() (or combine this with the above option and allow it to reject unexpected packets). Rejecting packets, however, could result in dropping potentially important events, such as user adds.

It's important to note that if the session lock is still being held while processing the Ready() packet, this still allows a user of the library to instantly deadlock it by doing something unusual inside of the OnReady handler, because the handler itself doesn't give any indication of the special circumstances in which it might be called.

colecrouter commented 3 years ago

Any update on this bad boy? Looking at the code/stack trace, it looks like code involved has remained relatively untouched since this issue was posted. On bad days, I get this panic about every third time I try to disconnect; pretty rough.

ErikMcClure commented 3 years ago

I am no longer maintaining my discord bot that uses discordgo, so I won't be able to provide any additional details, or reproduce this issue.

jame-developer commented 3 years ago

I haven't tested, actually I didn't used discordgo, but saw on the golang discord gophers server and checked it out ran tests, linters etc. and stumbled over https://github.com/bwmarrin/discordgo/blob/0fad116c6c2a01a8632c35db1484955a6dcc42ef/wsapi.go#L825 Here is a defer call in a for loop!

CarsonHoffman commented 3 years ago

@jame-developer is there an actually an issue here? If we hit line 825, we will never perform another iteration of the loop, by the return on line 836.

jame-developer commented 3 years ago

@CarsonHoffman Yes you are right, in the current state of the code there is no issue, I was a bit to focused on the defer. My linters bringing up those lines and I had a bad experience with defer and for loops, so I was a bit to alerted.