deepgram / deepgram-go-sdk

Go SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
31 stars 27 forks source link

Concurrent reads and writes leading to panics #230

Closed matthew-ceravolo closed 3 months ago

matthew-ceravolo commented 3 months ago

What is the current behavior?

from version 1.3.1:

panic: concurrent write to websocket connection

goroutine 2160 [running]:
github.com/dvonthenen/websocket.(*messageWriter).flushFrame(0xc00065c1e0, 0x1, {0x0?, 0x0?, 0x0?})
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:617 +0x4b8
github.com/dvonthenen/websocket.(*messageWriter).Close(0x0?)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:731 +0x35
github.com/dvonthenen/websocket.(*Conn).beginMessage(0xc000262420, 0xc0007b87b0, 0x1)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:480 +0x3a
github.com/dvonthenen/websocket.(*Conn).NextWriter(0xc000262420, 0x1)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:520 +0x3f
github.com/dvonthenen/websocket.(*Conn).WriteMessage(0xf8e300?, 0xc0007b8540?, {0xc0001491a0, 0x14, 0x18})
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:773 +0x138
github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).WriteJSON(0xf8e300?, {0xf8e300, 0xc0007b8540})
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.3.1/pkg/client/live/client.go:437 +0x22a
github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).ping(0xc0005f0680)
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.3.1/pkg/client/live/client.go:573 +0x26f
created by github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).ConnectWithRetry in goroutine 2198
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.3.1/pkg/client/live/client.go:239 +0xd3b

as well as, from version 1.1.3:

panic: repeated read on failed websocket connection

goroutine 333114 [running]:
github.com/dvonthenen/websocket.(*Conn).NextReader(0xc01a252f20)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:1030 +0x26e
github.com/dvonthenen/websocket.(*Conn).ReadMessage(0xc019297680?)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:1093 +0x13
github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).listen(0xc019297680)
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.1.3/pkg/client/live/client.go:227 +0x36e
created by github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).ConnectWithRetry in goroutine 330336
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.1.3/pkg/client/live/client.go:196 +0xc05

Steps to reproduce

There are race conditions that arise from normal usage. We just have a stream of audio from Twilio that we pump through via:

_, err := s.speechRecognizer.Write(audio)

Expected behavior

For it to not panic - reads and writes on websockets need to be protected with mutexes. neither reads nor writes can occur simultaneously

Please tell us about your environment

Other information

dvonthenen commented 3 months ago

Hi @matthew-ceravolo

For the write, this was fixed yesterday and is available in this release: https://github.com/deepgram/deepgram-go-sdk/releases/tag/v1.3.2

dvonthenen commented 3 months ago

For the read, you can't call Read() on your own. This is handled by the SDK itself. This SDK provides a higher level of abstraction and just need to implement the callback to receive all struct response from the Deepgram endpoint.

Take a look at this example here: https://github.com/deepgram/deepgram-go-sdk/blob/main/examples/streaming/microphone/main.go

matthew-ceravolo commented 3 months ago

yup, we only use the callbacks. I can DM you our code for this flow if it's helpful. I'll upgrade to 1.3.2 for the write fixes though in the meantime

matthew-ceravolo commented 3 months ago

and thank you for the super quick reply and help!

dvonthenen commented 3 months ago

I would definitely update to this latest release and see how it works for you based on the expected use of the SDK.

matthew-ceravolo commented 3 months ago

will update now and see how it goes. and I double-checked the code to verify that 1) there arent any exposed interfaces to call Read from directly via the Deepgram client, and 2) that we dont use the Deepgram client's Connect() to get any websockets directly that we Read from.

however, that Read panic log was from 1.1.3, so I'm sure a lot has changed since then. I'll update now and follow-up if we see it again in the future.

thanks again!

dvonthenen commented 3 months ago

yup, we only use the callbacks. I can DM you our code for this flow if it's helpful. I'll upgrade to 1.3.2 for the write fixes though in the meantime

If you want me to take a look, I definitely can and offer some guidance. If there is an issue with the SDK, it would also help to troubleshoot. You can reach out in Discord to share code and/or discuss.

matthew-ceravolo commented 3 months ago

@dvonthenen - we're still seeing the same read issue on 1.3.2 (caused one panic in total in the last 2 hours, across 5 pods):

panic: repeated read on failed websocket connection

goroutine 9417 [running]:
github.com/dvonthenen/websocket.(*Conn).NextReader(0xc0007762c0)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:1030 +0x26e
github.com/dvonthenen/websocket.(*Conn).ReadMessage(0xc00079f580?)
    /root/go/pkg/mod/github.com/dvonthenen/websocket@v1.5.1-dyv.2/conn.go:1093 +0x13
github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).listen(0xc00079f580)
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.3.2/pkg/client/live/client.go:284 +0x245
created by github.com/deepgram/deepgram-go-sdk/pkg/client/live.(*Client).ConnectWithRetry in goroutine 9239
    /root/go/pkg/mod/github.com/deepgram/deepgram-go-sdk@v1.3.2/pkg/client/live/client.go:237 +0xce7

-- for posterity: we dug into this one a bit more and agree this is separate from the concurrency issues

dvonthenen commented 3 months ago

Connecting with you in Discord

dvonthenen commented 3 months ago

Just putting some notes here for this issue.

Waiting for feedback on this, but internal testing looks like this is resolved. We can always reopen if that isn't the case.

There were a couple of issues resolved in PRs worth mentioning:

After resolving these issues, I think the behavior is what was intended in the design of the functionality. There were a number of examples introduced to help test these use cases. Will transform those into unit tests a little later down the road.