deepgram / deepgram-dotnet-sdk

.NET SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
28 stars 32 forks source link

Deepgram API does not return data upon disconnecting and reconnecting Websocket #302

Closed tieje3 closed 2 months ago

tieje3 commented 2 months ago

What is the current behavior?

Deepgram API does not return data upon disconnecting and reconnecting Websocket.

Disconnect definition

const disconnectFromDeepgram = () => {
    if (connection) {
        connection.finish();
        setConnection(null);
    }
};

First Connection

First Connection is successful. Deepgram sends transcription data. The connection is closed properly.

first_connection.txt

Response Header

HTTP/1.1 101 Switching Protocols
connection: upgrade
upgrade: websocket
sec-websocket-accept: 4+QamH27qm2OIALhefVT5Uo5yss=
access-control-allow-origin: http://localhost:60330
access-control-allow-credentials: true
vary: origin
vary: access-control-request-method
vary: access-control-request-headers
access-control-expose-headers: dg-model-name,dg-model-uuid,dg-char-count,dg-request-id,dg-error
sec-websocket-protocol: token
dg-request-id: 0caee309-30e3-4bf1-9bcc-478437df5c9d
date: Tue, 11 Jun 2024 23:52:34 GMT

Request Header

GET /v1/listen?language=en-US&model=nova-2-medical&vad_events=true&smart_format=true&interim_results=true&utterance_end_ms=2000 HTTP/1.1
Host: api.deepgram.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br, zstd
Sec-WebSocket-Version: 13
Origin: http://localhost:60330
Sec-WebSocket-Protocol: token, efc6[same token as second connection]
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Key: 9ipjJXwMJD2BpBtqcEG45A==
Connection: keep-alive, Upgrade
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: websocket
Sec-Fetch-Site: cross-site
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
first_connection

Second Connection

Second Connection is successful, but Deepgram does not send transcription data. The websocket connection times out from Deepgram's side stating that data was not sent, but it clearly was.

second_connection.txt

Response Headers

HTTP/1.1 101 Switching Protocols
connection: upgrade
upgrade: websocket
sec-websocket-accept: h+nFfjAKiy/zx698pSwXCu765+Q=
access-control-allow-origin: http://localhost:60330
access-control-allow-credentials: true
vary: origin
vary: access-control-request-method
vary: access-control-request-headers
access-control-expose-headers: dg-model-name,dg-model-uuid,dg-char-count,dg-request-id,dg-error
sec-websocket-protocol: token
dg-request-id: 9f153fcd-bdfd-44c7-badc-9edd06c6ef66
date: Tue, 11 Jun 2024 23:52:37 GMT

Request Headers

GET /v1/listen?language=es&model=nova-2&vad_events=true&smart_format=true&interim_results=true&utterance_end_ms=2000 HTTP/1.1
Host: api.deepgram.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br, zstd
Sec-WebSocket-Version: 13
Origin: http://localhost:60330
Sec-WebSocket-Protocol: token, efc6[same token as first connection]
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Key: KxAN8XlgnOQ0QEzjXWJZjg==
Connection: keep-alive, Upgrade
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: websocket
Sec-Fetch-Site: cross-site
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
second_connection

Steps to reproduce

Use useEffect() from React to disconnect and then reconnect with Deepgram API.

    useEffect(() => {
        if ((microphoneState === MicrophoneState.Ready // initial setup
            || microphoneState === MicrophoneState.Paused) // language change
            && connectionState === LiveConnectionState.CLOSED) {
            console.log('connect to deepgram');
            const newLiveSchema = LiveSchemaBuilder(DEEPGRAM_DEFAULTS, languageSelected);
            connectToDeepgram(newLiveSchema);

            return;
        }

        // languageSelected

        if (microphoneState === MicrophoneState.Paused
            && connectionState === LiveConnectionState.OPEN
            && changingLanguage.current
        ) {
            changingLanguage.current = false;
            disconnectFromDeepgram();

            return;
        }

        // connectionState

        if (!connection) return;
        if (!microphone) return;

        const onTranscript = (data: LiveTranscriptionEvent) => {
            const { is_final: isFinal } = data;
            const thisTranscription = data.channel.alternatives[0].transcript;

            if (isFinal && thisTranscription !== '') {
                const spacer = transcriptionRef.current === '' ? '' : ' ';
                const newTranscription = `${transcriptionRef.current}${spacer}${thisTranscription}`;
                transcriptionRef.current = newTranscription;
                setCommunication(newTranscription);
            }
        };

        const onUtteranceEnd = () => {
            onTranscriptionComplete(transcriptionRef.current);
        };

        // TODO: Add if connectionState is Error then...

        if ((microphoneState === MicrophoneState.Ready
            || microphoneState === MicrophoneState.Paused)
            && connectionState === LiveConnectionState.OPEN) {
            connection.addListener(LiveTranscriptionEvents.Transcript, onTranscript);
            connection.addListener(LiveTranscriptionEvents.UtteranceEnd, onUtteranceEnd);
            microphone.addEventListener(MicrophoneEvents.DataAvailable, onDataSendToDeepgram);

            startMicrophone(); // does not change micState

            return () => {
                changingLanguage.current = true;
                connection.removeListener(LiveTranscriptionEvents.Transcript, onTranscript);
                connection.removeListener(LiveTranscriptionEvents.UtteranceEnd, onUtteranceEnd);
                microphone.removeEventListener(MicrophoneEvents.DataAvailable, onDataSendToDeepgram);
                stopMicrophone();
                // invalidate temporary token
                // function to invalidate temporary token
            };
        }
    // eslint-disable-next-line react-hooks/exhaustive-deps
    }, [microphoneState, connectionState, languageSelected]);

Expected behavior

Successful re-connection with Deepgram API should return results.

Please tell us about your environment

Other information

tieje3 commented 2 months ago

Sorry, wrong SDK. Was tired from trying to solve this issue.