deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
156 stars 54 forks source link

WebSocket Connection Closes Intermittently After Metadata Message - {"type": "Metadata", "transaction_key": "deprecated"} #336

Closed subodhjena closed 1 month ago

subodhjena commented 1 month ago

What is the current behavior?

The WebSocket connection to Deepgram intermittently closes after sending a metadata message while developing in the local environment. The last message received before the connection closure is:

{
  "type": "Metadata",
  "transaction_key": "deprecated",
  "request_id": "f71f632f-df82-49d4-8459-a25671fc24d9",
  "sha256": "16d20704f0d9555a56aeb0b404fcff92791c7180ff96ea7910bf6e43ac3ff06c",
  "created": "2024-10-08T04:41:55.587Z",
  "duration": 0,
  "channels": 0
}

After this message, the WebSocket connection is closed unexpectedly. This issue does not always happen but occurs frequently during local development.

Steps to reproduce:

  1. Set up a Deepgram WebSocket connection in a local development environment using the provided code (TypeScript).
  2. Start streaming audio to Deepgram using the SDK.
  3. Observe the WebSocket connection behavior—intermittently, the connection will close after receiving a metadata message.

Expected behavior:

The WebSocket connection should remain active during the entire audio transmission, and the connection should not close prematurely after receiving a metadata message.

Please tell us about your environment:

Other information:

import {
  createClient,
  DeepgramClient,
  LiveClient,
  LiveTranscriptionEvents,
} from '@deepgram/sdk';
import logger from '../../utils/logger';

export interface SpeechToTextOptions {
  utteranceEndMilliseconds?: number;
  endpointing?: number;
}

export interface SpeechToTextCallbacks {
  onSpeechToTextOpen?: (message: string) => void;
  onSpeechToTextTranscription?: (data: any) => void;
  onSpeechToTextClose?: () => void;
  onSpeechToTextError?: (error: any) => void;
  onSpeechToTextMetadata?: (metadata: any) => void;
  onSpeechToTextSpeechStarted?: (data: any) => void;
  onSpeechToTextUtteranceEnd?: (data: any) => void;
}

export class SpeechToTextService {
  private deepgramClient?: DeepgramClient;
  private liveClient?: LiveClient;
  private keepAliveInterval?: NodeJS.Timeout;
  private readonly apiKey: string;
  private readonly options: SpeechToTextOptions;

  constructor(apiKey: string, options: SpeechToTextOptions = {}) {
    if (!apiKey) {
      throw new Error('DEEPGRAM_API_KEY must be provided.');
    }
    this.apiKey = apiKey;
    this.options = options;
    this.deepgramClient = createClient(apiKey);
    logger.info('SpeechToTextService initialized with API key and options.');
  }

  public start(callbacks: SpeechToTextCallbacks): void {
    try {
      logger.info('Starting SpeechToTextService...');
      this.initializeLiveClient(callbacks);
    } catch (error) {
      logger.error('Failed to start SpeechToTextService', error);
      callbacks.onSpeechToTextError?.(error);
    }
  }

  private initializeLiveClient(callbacks: SpeechToTextCallbacks): void {
    logger.info('Initializing live client for Deepgram connection...');
    this.liveClient = this.deepgramClient?.listen.live({
      model: 'nova-2',
      smart_format: true,
      interim_results: true,
      utterance_end_ms: this.options.utteranceEndMilliseconds ?? 2000,
      vad_events: true,
      endpointing: this.options.endpointing ?? 2000,
    });

    this.liveClient?.on(LiveTranscriptionEvents.Open, () => {
      logger.info('Deepgram WebSocket connection opened.');
      callbacks.onSpeechToTextOpen?.('Deepgram connection established.');
      this.startKeepAlive();
      this.registerTranscriptionEvents(callbacks);
    });
  }

  private registerTranscriptionEvents(callbacks: SpeechToTextCallbacks): void {
    logger.info('Registering transcription events...');

    this.liveClient?.on(LiveTranscriptionEvents.Transcript, (data) => {
      logger.debug('Transcript received:', data);
      callbacks.onSpeechToTextTranscription?.(data);
    });

    this.liveClient?.on(LiveTranscriptionEvents.Close, () => {
      logger.info('Deepgram WebSocket connection closed.');
      callbacks.onSpeechToTextClose?.();
      this.clearKeepAlive();
    });

    this.liveClient?.on(LiveTranscriptionEvents.Error, (error) => {
      logger.error('Deepgram WebSocket error occurred:', error);
      callbacks.onSpeechToTextError?.(error);
    });

    this.liveClient?.on(LiveTranscriptionEvents.Metadata, (metadata) => {
      logger.debug('Metadata received:', metadata);
      callbacks.onSpeechToTextMetadata?.(metadata);
    });

    this.liveClient?.on(LiveTranscriptionEvents.SpeechStarted, (data) => {
      logger.debug('Speech started event received:', data);
      callbacks.onSpeechToTextSpeechStarted?.(data);
    });

    this.liveClient?.on(LiveTranscriptionEvents.UtteranceEnd, (data) => {
      logger.debug('Utterance end event received:', data);
      callbacks.onSpeechToTextUtteranceEnd?.(data);
    });

    this.liveClient?.on(LiveTranscriptionEvents.Unhandled, (data) => {
      logger.warn(`Unhandled transcription event received:`, data);
    });
  }

  public transcribe(data: Buffer): void {
    if (!this.liveClient) {
      logger.error(
        'Attempted to transcribe data without an active LiveClient.',
      );
      return;
    }

    try {
      const liveClientState = this.liveClient.getReadyState();
      logger.debug(`WebSocket state before sending data: ${liveClientState}`);

      if (liveClientState === 0) {
        logger.warn('WebSocket is still connecting. Unable to send data.');
        return;
      }

      if (liveClientState === 1) {
        this.liveClient.send(data);
      } else if (liveClientState >= 2) {
        logger.warn('WebSocket is closing or closed. Data cannot be sent.');
      } else {
        logger.warn(`Invalid WebSocket state: ${liveClientState}`);
      }
    } catch (error) {
      logger.error('Failed to send data to Deepgram:', error);
    }
  }

  public stop(): void {
    if (!this.liveClient) {
      logger.warn(
        'Attempted to stop transcription without an active LiveClient.',
      );
      return;
    }

    try {
      logger.info('Stopping transcription service...');
      this.liveClient.requestClose();
      this.liveClient.removeAllListeners();
      this.clearKeepAlive();
      this.liveClient = undefined;
      logger.info('Transcription service stopped.');
    } catch (error) {
      logger.error('Failed to stop transcription service:', error);
    }
  }

  private startKeepAlive(): void {
    logger.info('Starting keep-alive interval for Deepgram WebSocket...');
    this.keepAliveInterval = setInterval(() => {
      logger.debug('Sending keep-alive ping to Deepgram...');
      this.liveClient?.keepAlive();
    }, 10 * 1000);
  }

  private clearKeepAlive(): void {
    if (this.keepAliveInterval) {
      logger.info('Clearing keep-alive interval...');
      clearInterval(this.keepAliveInterval);
      this.keepAliveInterval = undefined;
    }
  }
}

export default SpeechToTextService;

Here's some logs for you as well

[10:37:08.166] INFO (8700): SpeechToTextService initialized with API key and options.
[10:37:08.167] INFO (8700): Starting SpeechToTextService...
[10:37:08.167] INFO (8700): Initializing live client for Deepgram connection...
[10:37:09.179] WARN (8700): WebSocket is still connecting. Unable to send data.
[10:37:09.195] INFO (8700): Deepgram WebSocket connection opened.
[10:37:09.196] INFO (8700): UUID=9ccf52c2-1bb3-4302-8ee9-37859b60df3a STT Open: Deepgram connection established.
[10:37:09.200] INFO (8700): Starting keep-alive interval for Deepgram WebSocket...
[10:37:09.201] INFO (8700): Registering transcription events...
[10:37:18.644] INFO (8700): UUID=9ccf52c2-1bb3-4302-8ee9-37859b60df3a. STT Metadata {"type":"Metadata","transaction_key":"deprecated","request_id":"cdf77fbc-8d38-4450-94cf-68fd56ca98a7","sha256":"ae37f46ac581f07cf45013f0d5507afb9b8b85e178ba653c9484f4adf1047925","created":"2024-10-08T05:07:28.104Z","duration":0,"channels":0}
[10:37:18.649] INFO (8700): Deepgram WebSocket connection closed.
[10:37:18.649] INFO (8700): UUID=9ccf52c2-1bb3-4302-8ee9-37859b60df3a STT Closed
[10:37:18.649] INFO (8700): Clearing keep-alive interval...
[10:37:19.377] WARN (8700): WebSocket is closing or closed. Data cannot be sent.
[10:37:20.398] WARN (8700): WebSocket is closing or closed. Data cannot be sent.
[10:37:21.417] WARN (8700): WebSocket is closing or closed. Data cannot be sent.
[10:37:22.439] WARN (8700): WebSocket is closing or closed. Data cannot be sent.
[10:37:23.456] WARN (8700): WebSocket is closing or closed. Data cannot be sent.
[10:37:24.481] WARN (8700): WebSocket is closing or closed. Data cannot be sent.

Do let me know if there are any known issues with handling metadata messages or if there are any steps I can take to debug the WebSocket closure further. Any advice or potential fixes would be appreciated!

neeagl commented 1 month ago

We're facing the same issue as well here.

lukeocodes commented 1 month ago

The opposite of this is true. When closing the WebSocket from the Deepgram end, we send a metadata message.

The closing of the connection is usually because no data was sent for 10 seconds, or you sent invalid data. I'd suggest using our streaming test suite to debug your data stream.

subodhjena commented 1 month ago

@lukeocodes There are two aspects to the issue that I would like to highlight:

  1. Keep-alive Interval: The keep-alive function runs every 10 seconds to maintain the WebSocket connection. Do you think reducing the interval might help prevent the connection from closing prematurely?

  2. Browser Audio Streaming: We are directly sending browser audio to our WebSocket using the following code:

const audioTracks = mediaStreamRef.current.getAudioTracks();
if (audioTracks && audioTracks.length > 0 && isConnected && ws) {
  const audioStream = new MediaStream(audioTracks);

  try {
    const mediaRecorder = new MediaRecorder(audioStream, {
      mimeType: "audio/webm; codecs=opus",
    });

    mediaRecorder.ondataavailable = (event: BlobEvent) => {
      if (ws && event.data.size > 0 && ws.readyState === WebSocket.OPEN) {
        event.data.arrayBuffer().then((buffer) => {
          ws.send(buffer);
        });
      }
    };

    mediaRecorder.start(1000); // Records and sends audio chunks every 1 second
    setIsStreaming(true);
  } catch (error) {
    console.error("Error starting media recorder:", error);
  }
} else {
  console.error("No audio tracks available or WebSocket is not connected");
}

As I explained in my previous question, the connection does not always fail—it works fine on production and cloud servers. However, the issue mostly occurs when working locally, but I am concerned that this might happen in the production/cloud environment as well.

lukeocodes commented 1 month ago

Was this question asked in the Discord community too?

lukeocodes commented 1 month ago
  1. Yes reduce your interval to approx 8 seconds to mitigate any connectivity latency
  2. Have you recorded what systems/browsers this is occuring on?
lukeocodes commented 1 month ago

For your request ID: f71f632f-df82-49d4-8459-a25671fc24d9 we received no audio at all. Please let me know if you see any pattern in which browsers/OSs you're having issues with, or whether it's after something else has occurred on your app beforehand. It seems that we're not receiving audio in these instances

MD-AZMAL commented 1 month ago

@lukeocodes There are two aspects to the issue that I would like to highlight:

  1. Keep-alive Interval: The keep-alive function runs every 10 seconds to maintain the WebSocket connection. Do you think reducing the interval might help prevent the connection from closing prematurely?
  2. Browser Audio Streaming: We are directly sending browser audio to our WebSocket using the following code:
const audioTracks = mediaStreamRef.current.getAudioTracks();
if (audioTracks && audioTracks.length > 0 && isConnected && ws) {
  const audioStream = new MediaStream(audioTracks);

  try {
    const mediaRecorder = new MediaRecorder(audioStream, {
      mimeType: "audio/webm; codecs=opus",
    });

    mediaRecorder.ondataavailable = (event: BlobEvent) => {
      if (ws && event.data.size > 0 && ws.readyState === WebSocket.OPEN) {
        event.data.arrayBuffer().then((buffer) => {
          ws.send(buffer);
        });
      }
    };

    mediaRecorder.start(1000); // Records and sends audio chunks every 1 second
    setIsStreaming(true);
  } catch (error) {
    console.error("Error starting media recorder:", error);
  }
} else {
  console.error("No audio tracks available or WebSocket is not connected");
}

As I explained in my previous question, the connection does not always fail—it works fine on production and cloud servers. However, the issue mostly occurs when working locally, but I am concerned that this might happen in the production/cloud environment as well.

Yepp I tried reducing the keepalive timeout to 9seconds and the issue is happening less frequently, also @lukeocodes I am facing the same connection closing issue in the text-to-speech. I am using deepgram-sdk 3.8.0 in nodejs, but the live speech does not have a keepAlive event in the SDK, any idea how to fix that in live speech?

lukeocodes commented 1 month ago

So is this another issue now with TTS? Or the same issue with STT?

majumba commented 1 month ago

Following this, as I am also experiencing a similar issue in local environment (but not in production)

amnetgineersde commented 1 month ago

Same issue

lukeocodes commented 1 month ago

@lukeocodes There are two aspects to the issue that I would like to highlight:

  1. Keep-alive Interval: The keep-alive function runs every 10 seconds to maintain the WebSocket connection. Do you think reducing the interval might help prevent the connection from closing prematurely?
  2. Browser Audio Streaming: We are directly sending browser audio to our WebSocket using the following code:
const audioTracks = mediaStreamRef.current.getAudioTracks();
if (audioTracks && audioTracks.length > 0 && isConnected && ws) {
  const audioStream = new MediaStream(audioTracks);

  try {
    const mediaRecorder = new MediaRecorder(audioStream, {
      mimeType: "audio/webm; codecs=opus",
    });

    mediaRecorder.ondataavailable = (event: BlobEvent) => {
      if (ws && event.data.size > 0 && ws.readyState === WebSocket.OPEN) {
        event.data.arrayBuffer().then((buffer) => {
          ws.send(buffer);
        });
      }
    };

    mediaRecorder.start(1000); // Records and sends audio chunks every 1 second
    setIsStreaming(true);
  } catch (error) {
    console.error("Error starting media recorder:", error);
  }
} else {
  console.error("No audio tracks available or WebSocket is not connected");
}

As I explained in my previous question, the connection does not always fail—it works fine on production and cloud servers. However, the issue mostly occurs when working locally, but I am concerned that this might happen in the production/cloud environment as well.

Yepp I tried reducing the keepalive timeout to 9seconds and the issue is happening less frequently, also @lukeocodes I am facing the same connection closing issue in the text-to-speech. I am using deepgram-sdk 3.8.0 in nodejs, but the live speech does not have a keepAlive event in the SDK, any idea how to fix that in live speech?

I'm not following this, as there are so many accounts replying I don't know if I'm even replying to the original poster.

Latency can be a factor on KeepAlive. Reduce the timeout without penalty to factor this in. Try 5 seconds if you have to.

We have a ticket open for keepalive on TTS