deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
138 stars 51 forks source link

STT is broken in 3.4.0 and up (returns 401 error) #321

Closed orgads closed 1 week ago

orgads commented 1 month ago

What is the current behavior?

On client.listen.live(settings), with a valid key, I get 401 error. This works fine with version 3.3.4.

Other information

This results from a tiny behavioral change between websocket and ws packages, on protocols handling.

The related change is #293.

Before this change, w3cwebsocket was used. This package joins the protocols in Sec-WebSocket-Protocol with ', ' (comma followed by a space). See here:

reqHeaders['Sec-WebSocket-Protocol'] = this.protocols.join(', ');

Now, this has been replaced with ws, which has no space between the values. See here:

opts.headers['Sec-WebSocket-Protocol'] = protocols.join(',');

Apparently the server side is sensitive for this change, and rejects the request.

I confirmed this by modifying ws to use ', ', and the issue was resolved.

lukeocodes commented 1 month ago

This is strange, as our real-time examples work in testing. I wonder if this somehow runtime specific, too?

This was our change in this SDK

https://github.com/deepgram/deepgram-js-sdk/blob/main/src/packages/AbstractLiveClient.ts#L139-L146

orgads commented 1 month ago

If you add this before all other requires in node-live.js example, and use Node 20, it fails:

const { WebSocket } = require('ws');
global.WebSocket = WebSocket;

We have something similar in our application.

orgads commented 1 month ago

Is it possible to fix this on the server? See https://github.com/websockets/ws/pull/2244#issuecomment-2245214238

lukeocodes commented 1 month ago

That's odd, as we use ws for node runtime websockets in the SDK. Can you tell me anything else about your environment?

lukeocodes commented 1 month ago

Is it possible to fix this on the server? See websockets/ws#2244 (comment)

I believe this refers to your server, when you're running the SDK code. I am not sure how. As I said, we run node example code in multiple versions and this has never come up

orgads commented 1 month ago

It's exactly where you pointed. NATIVE_WEBSOCKET_AVAILABLE is true if WebSocket is defined. Since I assign global.WebSocket = WebSocket then in my case it is defined.

I use the default deepgram api server, didn't configure anything special. This reproduces with your example, just add these 2 lines at the file header.

lukeocodes commented 1 month ago

It's exactly where you pointed. NATIVE_WEBSOCKET_AVAILABLE is true if WebSocket is defined. Since I assign global.WebSocket = WebSocket then in my case it is defined.

I use the default deepgram api server, didn't configure anything special. This reproduces with your example, just add these 2 lines at the file header.

The library maintainer (if that is who is replying) is suggesting to you that all API vendors, including Deepgram, which serve millions of requests per hour, make this flexible. Flexible code has a performance overhead. I challenge his suggestion that this is effectively a bug in our API.

orgads commented 1 month ago

While I see your point, he is right that servers should conform to the RFC and not make assumptions about the input. The performance overhead is negligible for this kind of parsing.

lukeocodes commented 1 month ago

While I see your point, he is right that servers should conform to the RFC and not make assumptions about the input. The performance overhead is negligible for this kind of parsing.

I think the suggestion that any overhead is negligible is plain wrong.

v7 of ws had a space. I will be reverting ws to v7 at some point.

orgads commented 1 month ago

I have to say that actually stripping out the space should have performance impact more than handling a header without a space. Did you check the server logic?

lukeocodes commented 1 month ago

I have to say that actually stripping out the space should have performance impact more than handling a header without a space. Did you check the server logic?

Ok

lpinca commented 1 month ago

I have to say that actually stripping out the space should have performance impact more than handling a header without a space. Did you check the server logic?

This is the reason why we removed it in ws@8, and because fewer bytes are sent over the wire without multiple spaces.

lukeocodes commented 1 month ago

We'll pin v7 until we have a moment to discuss a change to our auth service. Thanks for your help

orgads commented 1 month ago

@lukeocodes In my case, it won't help, because I use v8, and I assign the global WebSocket to v8. If nobody else complained, I don't think you have a reason to pin it. I'll stick with 3.3.5 until there's a proper fix.

lukeocodes commented 1 month ago

@orgads we now have a PR in review to resolve this on our service. I will let you know once it has been released

orgads commented 1 month ago

Thank you!

lukeocodes commented 1 week ago

This has now been released on our API

mozziebr commented 1 day ago

@lukeocodes - I'm using production server websocket. Is this fix also resolved for this? https://developers.deepgram.com/reference/listen-live#endpoint I'm having this error message:

{"timestamp": "2024-09-06T16:11:27.426259Z", "level": "error", "msg": "Invalid WebSocket handshake: no subprotocols supported"}

And on postman hitting the websocket works fine: image