deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
147 stars 54 forks source link

Websocket connection closing abruptly in both Text to Speech and Speech to Text. #337

Open MD-AZMAL opened 2 hours ago

MD-AZMAL commented 2 hours ago

What is the current behavior?

The WebSocket connection to Deepgram abruptly closes while using live speech or live listen while running locally. I haven't tried this in production yet. There is no error event fired before closing, I get a metadata event after it is closed. so it is difficult to diagnose what has happened. The sample metadata event for live listen is as follows for reference:

{
  "type": "Metadata",
  "transaction_key": "deprecated",
  "request_id": "SOME_ID,
  "sha256": "SOME_HASH",
  "created": "SOME_DATE",
  "duration": 0,
  "channels": 0
}

Also checking the request id in deepgram's account shows no error.

Steps to reproduce

It happens occasionally while the server is running, initiate a live listen and live speech connection and it terminates frequently.

Expected behavior

The connection should remain open until explicitly closed, or any error occurs.

Please tell us about your environment

We want to make sure the problem isn't specific to your operating system or programming language.

Other information

I am attaching the code for both the speech to text and text to speech module below:

const deepgram = createClient("MY_API_KEY");

// custom websocket server
const wss = new WebSocket.Server({ server: server });

wss.on("connection", (ws) => {
  // create a new Live listen connection

  const liveListen = deepgram.listen.live({
    model: "nova-2",
    language: "en-US",
    smart_format: true,
    interim_results: true,
    utterance_end_ms: 1000,
    vad_events: true,
    endpointing: 300,
  });

  liveListen.on(LiveTranscriptionEvents.Open, () => {
    console.log("Live Listen Open");

    // keep alive loop every 8s
    setInterval(() => {
      liveListen.keepAlive(), 8 * 1000;
    });
  });

  liveListen.on(LiveTranscriptionEvents.Close, () => {
    console.log("Live Listen Close");
  });

  liveListen.on(LiveTranscriptionEvents.Error, (error) => {
    console.log("Live Listen ERROR====================================");
    console.log(error);
    console.log("====================================");
  });

  liveListen.on(LiveTranscriptionEvents.Metadata, (data) => {
    console.log("Live Listen METADATA====================================");
    console.log(JSON.stringify(data, null, 1));
    console.log("====================================");
  });

  liveListen.on(LiveTranscriptionEvents.Transcript, (data) => {
    console.log("Live Listen TRANSCRIPT====================================");
    console.log(data?.channel?.alternatives[0]?.transcript);
    console.log("====================================");
  });

  // create a new Live Speech connection

  const liveSpeak = deepgram.speak.live({
    model: "aura-asteria-en",
    encoding: "linear16",
    sample_rate: 16000,
  });

  liveSpeak.on(LiveTTSEvents.Open, () => {
    console.log("Live Speak Open");
  });

  liveSpeak.on(LiveTTSEvents.Close, () => {
    console.log("Live Speak Close");
  });

  liveSpeak.on(LiveTTSEvents.Error, (error) => {
    console.log("Live Speak ERROR====================================");
    console.log(error);
    console.log("====================================");
  });

  liveSpeak.on(LiveTTSEvents.Metadata, (data) => {
    console.log("Live Speak METADATA====================================");
    console.log(JSON.stringify(data, null, 1));
    console.log("====================================");
  });

  liveListen.on(LiveTTSEvents.Audio, (data) => {
    console.log("Live Speak AUDIO====================================");
    console.log(data);
    console.log("====================================");
  });

  liveListen.on(LiveTTSEvents.Flushed, () => {
    console.log("Live Speak Flushed");
  });

  ws.on("message", (data: string) => {
    const jsonData = JSON.parse(data);

    if(jsonData.transcribe) {
      liveListen.send(jsonData.text)
    } else {
      liveSpeak.send(jsonData.audioChunk)
    }
  })
});

One might argue that opening a new connection to deepgram for each new client connected to websocket server is the issue, but its not working even when a single client is connected locally.

lukeocodes commented 2 hours ago

Can you provide results of your debugging output? Request IDs?