deepgram / deepgram-dotnet-sdk

.NET SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
33 stars 31 forks source link

Socket client deadlocks on azure machines with 2 CPU #344

Open vizakgh opened 1 week ago

vizakgh commented 1 week ago

What is the current behavior?

deadlocks on machine with CPU < 8

Steps to reproduce

run client on 2 CPUs machine

Expected behavior

no deadlocks

Please tell us about your environment

azure web app 2 vCPUs - deadlocks the same with 8 CPUs - no deadlocks.

Other information

investigation in progress. probably issue is in

void StartSenderBackgroundThread() => = Task.Factory.StartNew( => ProcessSendQueue(), TaskCreationOptions.LongRunning);

void StartReceiverBackgroundThread() => = Task.Factory.StartNew( => ProcessReceiveQueue(), TaskCreationOptions.LongRunning);

void StartKeepAliveBackgroundThread() => = Task.Factory.StartNew( => ProcessKeepAlive(), TaskCreationOptions.LongRunning);

void StartAutoFlushBackgroundThread() => = Task.Factory.StartNew( => ProcessAutoFlush(), TaskCreationOptions.LongRunning);


dvonthenen commented 1 week ago

Taking a look at this right now.

dvonthenen commented 1 week ago

Yup 💯 Looking at fix for this.

vizakgh commented 1 week ago

problem is in message enquee (channel write/read). I replaced EnqueueSendMessage and ProcessSendQueue by direct _clientWebSocket.SendAsync and all work correctly.

dvonthenen commented 6 days ago

Yup, found the same problem and working on the fix.

vizakgh commented 6 days ago

P.S. you can't do this: lock (_mutexSend) { Log.Verbose("SendBinaryImmediately", "Sending binary message immediately.."); if (length == -1) { length = data.Length; }

 _clientWebSocket.SendAsync(new ArraySegment<byte>(data, 0, length), WebSocketMessageType.Binary, endOfMessage: true, _cancellationTokenSource.Token).ConfigureAwait(continueOnCapturedContext: false);

}

we can't do another send on the same web socket. but you don't wait SendAsync. _mutexSend will be unlocked before SendAsync will finish and we can enter this code again and have second send (first send still in progress) on the same socket.

vizakgh commented 6 days ago

I refactored this method in my project: public async Task SendBinaryImmediately(byte[] data, int length = Constants.UseArrayLengthForSend) { if (!await _mutexSend.WaitAsync(SEND_MUTEXT_TIMEOUT, _cancellationTokenSource.Token)) { Log.Error("SendBinaryImmediately", "Mutex timeout"); return; }

 try
 {
     Log.Verbose("SendBinaryImmediately", "Sending binary message immediately..");  // TODO: dump this message
     if (length == Constants.UseArrayLengthForSend)
     {
         length = data.Length;
     }
     await _clientWebSocket.SendAsync(new ArraySegment<byte>(data, 0, length), WebSocketMessageType.Binary, true, _cancellationTokenSource.Token);
 }
 finally
 {
     _mutexSend.Release();
 }

}

dvonthenen commented 2 days ago

Yep, that was my conclusion as well. There is a PR open @vizakgh if you want to take a look

vizakgh commented 2 days ago

hi. thanks. I can't find in PR fix for this: https://github.com/deepgram/deepgram-dotnet-sdk/issues/344#issuecomment-2436390253

you can reproduce this issue using big data chunks in SendAsync and slow network.

dvonthenen commented 2 days ago

Should be in the link in the issue here https://github.com/deepgram/deepgram-dotnet-sdk/pull/345

The change is significant because of the backward compatibility guarantees that we need to keep

dvonthenen commented 1 day ago

You need to either use:

dvonthenen commented 21 hours ago

Hi @jcdyer Have you been able to verify this PR addresses your issue?