deepgram / deepgram-dotnet-sdk

.NET SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
31 stars 31 forks source link

Deepgram API Key is invalid using DeepgramWsClientOptions #342

Open vizakgh opened 4 days ago

vizakgh commented 4 days ago

What is the current behavior?

Exception "Deepgram API Key is invalid"

Steps to reproduce

var liveClient = new ListenWebSocketClient(_dgApiKey, new DeepgramWsClientOptions { ApiKey = _dgApiKey, KeepAlive = true });

Expected behavior

no exceptions

Please tell us about your environment

Win11 last VS 2022 last

Other information

this works correctly: var liveClient = new ListenWebSocketClient(_dgApiKey, new DeepgramWsClientOptions(_dgApiKey, null, true));

dvonthenen commented 4 days ago

Hi @vizakgh

Thanks for the report. Will take a look.

I suspect that it's because you are setting the parameters like this (which is totally valid):

new DeepgramWsClientOptions
{
ApiKey = _dgApiKey,
KeepAlive = true
}

If you were to do this, I suspect it would probably work (https://github.com/deepgram/deepgram-dotnet-sdk/blob/main/examples/speech-to-text/websocket/microphone/Program.cs#L27-L30):

DeepgramWsClientOptions options = new DeepgramWsClientOptions(null, null, true);
 var liveClient = new ListenWebSocketClient(_dgApiKey, options);

I will need to add some code to handle the setting via the properties.

vizakgh commented 4 days ago

Hi.

Thanks. I already use options ctor - it works.

P.S. I see stable error - DG closes socket without calling ErrorResponse handler. And mostly it happens on azure dev env (2 CPUs web app). Especially if I send audio chunks every 200ms (instead of 500ms). On my local machine (16 CPUs) it works correctly. I'm still investigating the issue. No error reason in log is real problem :(

I send native browser "webm" audio.

Regards, Victor


От: David vonThenen @.> Отправлено: 22 октября 2024 г. 21:40 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

Hi @vizakghhttps://github.com/vizakgh

Thanks for the report. Will take a look.

I suspect that it's because you are setting the parameters like this (which is totally valid):

new DeepgramWsClientOptions { ApiKey = _dgApiKey, KeepAlive = true }

If you were to do this, I suspect it would probably work (https://github.com/deepgram/deepgram-dotnet-sdk/blob/main/examples/speech-to-text/websocket/microphone/Program.cs#L27-L30):

DeepgramWsClientOptions options = new DeepgramWsClientOptions(null, null, true); var liveClient = new ListenWebSocketClient(_dgApiKey, options);

I will need to add some code to handle the setting via the properties.

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2429990172, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPRC7XIICMVBO5QCGKLZ42L3FAVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZHE4TAMJXGI. You are receiving this because you were mentioned.Message ID: @.***>

dvonthenen commented 4 days ago

You can try to enable more debugging (https://github.com/deepgram/deepgram-dotnet-sdk/blob/main/examples/speech-to-text/websocket/microphone/Program.cs#L20):

Deepgram.Library.Initialize(LogLevel.Debug); // LogLevel.Default, LogLevel.Debug, LogLevel.Verbose

I would recommend setting to Verbose for max logging. If you copy and paste the output here, I can help debug. If the logging is sensitive, I would be glad to chat in Discord.

dvonthenen commented 4 days ago

Usually when the server closes the connection when you have keep alive enabled, it's usually because the encoding or sample rate values do not match what the audio stream actually is.

vizakgh commented 3 days ago

Hi.

I tried also with WAV/PCM using old JS encoder (I used it with Gladia). The same result. Today it doesn't work at all (yesterday it was worked sometimes)

2024-10-23 17:57:36.507 [Debug] State: WebSocket State: Open 2024-10-23 17:57:36.507 [Verbose] ProcessSendQueue: Sending message... 2024-10-23 17:57:36.613 [Verbose] ProcessReceiveQueue: Received message: System.Net.WebSockets.WebSocketReceiveResult / System.IO.MemoryStream 2024-10-23 17:57:36.615 [Verbose] ListenWSClient.ProcessDataReceived: ENTER 2024-10-23 17:57:36.615 [Verbose] ProcessDataReceived: raw response: {"type":"Metadata","transaction_key":"deprecated","request_id":"deda472a-6bf0-4ea4-88c5-0d235032c6f4","sha256":"fc4ef741527af4eb3e4d0d7c81b8685111ac40c6fa806491a86c2b5ac137463f","created":"2024-10-23T14:57:23.420Z","duration":0.0,"channels":0} 2024-10-23 17:57:36.616 [Verbose] ProcessDataReceived: Type: Metadata 2024-10-23 17:57:36.635 [Debug] ProcessDataReceived: Invoking MetadataResponse. event: { "channels": 0, "created": "2024-10-23T14:57:23.42Z", "duration": 0, "request_id": "deda472a-6bf0-4ea4-88c5-0d235032c6f4", "sha256": "fc4ef741527af4eb3e4d0d7c81b8685111ac40c6fa806491a86c2b5ac137463f", "transaction_key": "deprecated", "type": "Metadata" } 2024-10-23 17:57:36.649 [Debug] ProcessDataReceived: Succeeded 2024-10-23 17:57:36.650 [Verbose] ListenWSClient.ProcessDataReceived: LEAVE 2024-10-23 17:57:36.651 [Information] ProcessReceiveQueue: Received WebSocket Close. Trigger cancel... 2024-10-23 17:57:36.652 [Verbose] ListenWSClient.Stop: ENTER 2024-10-23 17:57:36.652 [Information] Stop: Using default disconnect cancellation token

initialization: var liveClient = new ListenWebSocketClient(_dgApiKey, new DeepgramWsClientOptions(_dgApiKey, null, true));

var liveSchema = new LiveSchema { Model = "nova-2", Language ="en", Diarize = false, Dictation = false, Punctuate = false, SmartFormat = false };

P.S. I dumped both files (wbem and wav) on backend. Files are correct (I can open wbem file with win media player).

Regards, Victor


От: David vonThenen @.> Отправлено: 22 октября 2024 г. 23:12 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

Usually when the server closes the connection when you have keep alive enabled, it's usually because the encoding or sample rate values do not match what the audio stream actually is.

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2430166662, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPRBYVWNOVGAQUGH6JTZ42WT7AVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQGE3DMNRWGI. You are receiving this because you were mentioned.Message ID: @.***>

dvonthenen commented 3 days ago

You need to set the encoding and sample_rate values in LiveSchema. Depending on the form of chunking that either the audio source or you might be doing, those parameters are required. Please see my previous message.

The connection is exiting within a second which is a clear indication of this failure.

vizakgh commented 3 days ago

it doesn't help. According to docs DG supports only Ogg opus (not opus in wbem container), right?

Ffmpeg report: [cid:8cb737e5-61ab-4899-a725-ddf7dfd27fb4]

schema: var liveSchema = new LiveSchema { Model = "nova-2", Language = "en", Channels = 1, Encoding = "opus", SampleRate = 48000, Diarize = false, Dictation = false, Punctuate = false, SmartFormat = false };


От: David vonThenen @.> Отправлено: 23 октября 2024 г. 18:19 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

You need to set the encoding and sample_rate values in LiveSchema. Depending on the form of chunking that either the audio source or you might be doing, those parameters are required.

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2432595769, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPTUL43W6TT6PL3NOH3Z4647HAVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZSGU4TKNZWHE. You are receiving this because you were mentioned.Message ID: @.***>

dvonthenen commented 3 days ago

That is correct, ogg opus. The source audio streaming needs to be one of the supported formats. Depending on the platform, you can change the audio output from the source to one of the supported formats.

vizakgh commented 3 days ago

But DG supports webm/opus - I have unit tests : microsoft speech SDK -> webm/opus -> Deepgram. This flow works. Speech SDK format: SpeechSynthesisOutputFormat.Webm24Khz16BitMonoOpus.

So, this works (encoded by MS speech SDK): [cid:eedf7e39-0eb2-4f54-90fe-080cca60e1d8]

This doesn't work or works time by time (encode by Edge/Chrome): [cid:b22fcf43-a70f-434f-a774-d21f6b287871]


От: David vonThenen @.> Отправлено: 23 октября 2024 г. 18:32 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

That is correct, ogg opus. The source audio streaming needs to be one of the supported formats. Depending on the platform, you can change the audio output from the source to one of the supported formats.

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2432637603, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPQLPES4D4ZCCMTGAH3Z466SVAVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZSGYZTONRQGM. You are receiving this because you were mentioned.Message ID: @.***>

dvonthenen commented 3 days ago

I assume the LiveOptions are the same, correct? This would point to an issue with the audio encoding on Edge/Chrome or the options not patching.

  1. Can you provide sample code for each of your tests (MS speech SDK and Edge/Chrome)?
  2. The LiveSchema options for each option (MS speech SDK and Edge/Chrome)?

We do have users in both those camps, so this should be a configuration problem matching the stream.

dvonthenen commented 3 days ago

If you also want to troubleshoot this via Discord since I am fairly certain this is a configuration issue, it might go a lot faster than going back and forth on this issue here.

vizakgh commented 3 days ago

Schema is the same: var liveSchema = new LiveSchema { Model = "nova-2", Language = "en", Channels = 1, Diarize = false, Dictation = false, Punctuate = false, SmartFormat = false };

For sample code - I will contact you tomorrow in Discord.

It can be audio chunks issue. In UT "MS speech SDK" sends whole audio stream very fast. In real app browser sends audio by chunks (once per 500ms) with human speaking speed. Ok. Let's continue in Discord tomorrow. Thanks.

Regards, Victor


От: David vonThenen @.> Отправлено: 23 октября 2024 г. 20:18 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

I assume the LiveOptions are the same, correct? This would point to an issue with the audio encoding on Edge/Chrome or the options not patching.

  1. Can you provide sample code for each of your tests (MS speech SDK and Edge/Chrome)?
  2. The LiveSchema options for each option (MS speech SDK and Edge/Chrome)?

We do have users in both those camps, so this should be a configuration problem matching the stream.

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2432921844, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPWTOHRSXU7F42XE5N3Z47K7FAVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZSHEZDCOBUGQ. You are receiving this because you were mentioned.Message ID: @.***>

dvonthenen commented 3 days ago

For the original issue with:

var liveClient = new ListenWebSocketClient(_dgApiKey, new DeepgramWsClientOptions
{
ApiKey = _dgApiKey,
KeepAlive = true
});

If _dgApiKey is set to "" or a real API key, initializing the ListenWebSocketClient works for me. What value are you using for _dgApiKey?

vizakgh commented 2 days ago

Hi. This code throws exception. DG version: Ping me in Discord pls. victor_88880


От: David vonThenen @.> Отправлено: 24 октября 2024 г. 0:42 Кому: deepgram/deepgram-dotnet-sdk @.> Копия: vizak @.>; Mention @.> Тема: Re: [deepgram/deepgram-dotnet-sdk] Deepgram API Key is invalid using DeepgramWsClientOptions (Issue #342)

For the original issue with:

var liveClient = new ListenWebSocketClient(_dgApiKey, new DeepgramWsClientOptions { ApiKey = _dgApiKey, KeepAlive = true });

If _dgApiKey is set to "" or a real API key, initializing the ListenWebSocketClient works for me. What value are you using for _dgApiKey?

— Reply to this email directly, view it on GitHubhttps://github.com/deepgram/deepgram-dotnet-sdk/issues/342#issuecomment-2433519124, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTBAPVKPBMI6COQC7V5A6TZ5AJ3ZAVCNFSM6AAAAABQMAC3D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGUYTSMJSGQ. You are receiving this because you were mentioned.Message ID: @.***>

vizakgh commented 2 days ago

this code throws exception: / // Constructor // public DeepgramWsClientOptions(string? apiKey = null, string? baseAddress = null, bool? keepAlive = null, bool? onPrem = null, Dictionary<string, string>? addons = null, Dictionary<string, string>? headers = null) { ...

    ApiKey = apiKey ?? "";

...

    // user provided takes precedence
    if (string.IsNullOrWhiteSpace(ApiKey))
    {
        // then try the environment variable
        Log.Debug("DeepgramWsClientOptions", "API KEY is not set");
        ApiKey = Environment.GetEnvironmentVariable(variable: Defaults.DEEPGRAM_API_KEY) ?? "";
        if (!string.IsNullOrEmpty(ApiKey))
        {
            Log.Information("DeepgramWsClientOptions", "API KEY set from environment variable");
        } else {
            Log.Warning("DeepgramWsClientOptions", "API KEY environment variable not set");
        }
    }
    if (!OnPrem && string.IsNullOrEmpty(ApiKey))
    {
        var exStr = "Deepgram API Key is invalid";
        **throw new ArgumentException(exStr);**