coder / websocket

Minimal and idiomatic WebSocket library for Go
ISC License
3.8k stars 284 forks source link

failed to close writer: context canceled #474

Open lukaspj opened 2 weeks ago

lukaspj commented 2 weeks ago

I'm using this library to create a zero-trust tunnel into a Kubernetes cluster, when I encountered this error message:

failed to write msg: failed to close writer: failed to acquire lock: context canceled

So what happened was that when client closed the connection, I would send a "connection closed" message over the WebSocket connection, but I (mistakenly) used the request context for the write command. This context would get cancelled somewhere around this Write call, sometimes after message was written but before it released the lock.

As a result, the lock would never get released and the websocket was no longer usable.

The solution was to create a new context for the write:

    writeCtx, _ := context.WithTimeout(context.Background(), 2*time.Second)

    err = c.Websocket.Write(writeCtx, websocket.MessageText, marshal)

And I don't think I should have used the request context to begin with, but I do think it's not optimal behaviour that the websocket can be left in a broken state by mistimed context cancel.

In any case, just thought I'd share it. Thanks for maintaining this awesome library!

mafredri commented 2 weeks ago

Hey @lukaspj, thanks for the user experience report.

You are definitely not the first! This is a very common mistake to make. There is one valid use-case for r.Context() after the connection being hijacked1, but in general it's almost never what you want.

I see this issue is further exacerbated by the readme and examples using it incorrectly (https://github.com/coder/websocket?tab=readme-ov-file#server).

We should at least update the readmes and examples, but I'll try to think about some alternative solutions as well.

Perhaps the most obvious way would be to put a context on websocket.Conn, but defining it's behavior would be a bit tricky.

Anyway, if you have anything else to add, how you wish this had worked instead, what lead you down the wrong path, etc, please feel free to share.

1 A valid use-case is when you're running the websocket logic in a separate goroutine, and want it to stop when the HTTP handler exits. Usually the logic runs in the HTTP handler, though, where using r.Context() is useless. (https://pkg.go.dev/net/http#Hijacker)

lukaspj commented 2 weeks ago

I think you pin-pointed it, if the readme and examples explained it, it would probably help.

Additionally, it would be nice if the function documentation would state this danger of closing the context. Something like "if the context is cancelled, it can cause a deadlock" or just "the context should be the context of the websocket not the context of the message being written". Idk if there's a nice way to write it tbh, but would be nice if it showed up on the documentation somewhere on "how to use this library".

spikecurtis commented 1 week ago

@mafredri I don't think a docs update alone will resolve this issue.

@lukaspj seems to be saying that if the context you pass to Write() gets canceled, it can leave the Websocket in a broken state (a lock that never gets released). While it's true that you probably don't want to use the r.Context() for calls to Write() in most cases, I would still consider it a bug if an ill-timed context timeout leaves the websocket in a broken state.

mafredri commented 1 week ago

@spikecurtis I was thinking about the original report, but that's a good callout. We should split the second issue into a new ticket.

There may be multiple issues in how connection closure is propagated and context cancellation is handled. There seems to be multiple code-paths depending on scenario.

I also noticed while looking at https://github.com/coder/websocket/pull/476 that a read/write context that is cancelled actually ends up closing the connection abruptly, whereas I would expect we just give up on the write.