deepgram / deepgram-go-sdk

Go SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
34 stars 29 forks source link

Handling streaming endpoint response in callback function #193

Closed bryanwux closed 7 months ago

bryanwux commented 8 months ago

When it comes to speed, I am curious about the performance difference between using a callback function to receive streaming data and reading directly from a WebSocket using a low level api like: wsConn.ReadMessage().

dvonthenen commented 8 months ago

Hi @bryanwux

Yea, that's a good question. I haven't benchmarked it, but the underlying implementation just does a wsConn.ReadMessage() and then calls the function you implement. There are definitely some things the SDK provides such as logging, conveniences, etc that will add to the overhead, but I think it's probably very small. (Again, only benchmarking would prove that). The compiler should also help optimize a lot of these concerns as well. The advantages of applications that go down to machine code.

Utlimately, any SDK is going to add in some overhead because they are design to be general purpose to be used by all and hopefully doing the 80/20 rule of tackling most use cases. (Again without having benchmarked it) If speed is of the highest concern and saving milliseconds (??) on execution times is important (for example some router firmware), then you probably would be best served by not using the SDK and working with the WebSocket directly. You can put the WebSocket closer to your business logic and even do extreme things (doing typically undesirable things like having spaghetti code) that would, in fact, dramatically increase performance at the cost of maintainability, etc. That's something that would be up to the user to make their own call on how to proceed. The downside is you are exposed to maintaining lower level protocol stuff, Deepgram platform details, etc much like implementing your own custom SDK.

I hope that helps.

dvonthenen commented 8 months ago

If you want to discuss on Discord, stop by and let's chat.

bryanwux commented 8 months ago

Thank you for your quick response @dvonthenen. I would defintely happy to discuss this with you in Discord.

I also want to add some background context about what I'm doing. So initially I was using websocket as suggested by: link in the backend code. The code logic is basically calling DgClient.WriteBinary to write audio data to websocket in a clientStream goroutine, and wsConn.ReadMessage() with some processing with the response in a serverStream goroutine. The final result will be sent to the frontend. After several tries, I find there're some connection issues which will occasionally close the ws connection. I didn't figure why so I switch to implement LiveMessageCallback interface and handle all the streaming response in the Message function instead of receving from websocket directly.

This callback solution works well so far but there're some delays when showing up the streaming result in frontend. I'm not sure this is an issue of the streaming api itself or bringing in callback. So this is reason why I ask about this.

dvonthenen commented 7 months ago

One thing that did come up in the discussion in Discord is that even though you are getting callbacks on this interface, it's probably a good idea to setup a channel, queue, etc other means to pull the event off as not to block the callback. If the work needing to be done can take some time, you don't want to implement it in these callback functions.