deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
127 stars 45 forks source link

Bug: (alpha/beta) LiveClient closing too early - lost transcript events #198

Closed ftr-lwilson closed 7 months ago

ftr-lwilson commented 7 months ago

What is the current behavior?

I notice that when streaming transcription through the live client I was losing events towards the end of the stream. I believe this is because the new alpha/beta LiveClient explitly closes the websocket on the client side, rather than letting the sever terminate the connection here: https://github.com/deepgram/deepgram-node-sdk/blob/c0def146e0d480c9b09c5a366c8b21d67609150e/src/packages/LiveClient.ts#L152C7-L152C7

Looking at the original implementation on the main branch, it simply sends a message to the server indicating that no more data will be sent without an explicit client socket.close(). https://github.com/deepgram/deepgram-node-sdk/blob/95f9291f20c92058c013923a357c5a1c6dc7f2de/src/transcription/liveTranscription.ts#L106

I have tried commenting out the socket.close() call and that does seem to fix the issue - all events come through before the connection closes. 🥳

According to the MDN documentation on the the WebSocket API:

The process of closing the connection begins with a closing handshake, and the close() method does not discard previously-sent messages before starting that closing handshake; even if the user agent is still busy sending those messages, the handshake will only start after the messages are sent.

It does stand to reason that all data is send to the server over the socket before it is closed, but its ambiguous as to whether the server has opportunity to continue return messages.

Steps to reproduce


import { createClient, LiveTranscriptionEvent, LiveTranscriptionEvents } from '@deepgram/sdk'
import { createReadStream } from 'fs'

const audio = createReadStream('test-audio-file.wav')

const client = createClient(this.apiKey)
const connection = client.listen.live({
  interim_results: true,
})

connection
  .on(LiveTranscriptionEvents.Open, () => {
    audio
      .on('data', (data: Buffer) => {
        connection.send(data)
      })
      .on('end', () => {
        connection.finish() // here we're telling the client there is no more data as soon as the source stream ends
      })
  })
  .on(LiveTranscriptionEvents.Transcript, (event: LiveTranscriptionEvent) => {
    console.log(event)
  })
  .on(LiveTranscriptionEvents.Close, () => {
    console.log('closed!
  })

Expected behavior

Ideally, the whole audio is transcribed and emitted. An obvious issue is when the last event has is_final set to false - I would definitely expect the last event to be final.

Please tell us about your environment

lukeocodes commented 7 months ago

is_final refers to our interim results feature, and has nothing to do with the specifics of how the websocket is implemented.

It may be that the client close occurring when a finish is requested is not letting the final transcription events come through. I will need to check this, as i had explicitly tested that this would not be the case.

One thing I hadn't accounted for was an end event on the data/microphone, which is an oversight on my part.

ftr-lwilson commented 7 months ago

Thanks for getting back to me Luke!

is_final refers to our interim results feature, and has nothing to do with the specifics of how the websocket is implemented.

Yes yes absolutely. But if this feature is enabled, I would have thought that the last received event would be a finalised result, since there is no more work to do.

For example, for the given audio with speech: "Hello Luke, how are you?", I would expect the events to look something like:

  1. "Hello Luke" is_final=false
  2. "Hello Luke, how" is_final=false
  3. "Hello Luke, how are you?" is_final=true

But in practice, the last (or last few) events tend to be lost, yielding incomplete results, and ending on a non final event

  1. "Hello Luke" is_final=false
  2. "Hello Luke, how" is_final=false

It may be that the client close occurring when a finish is requested is not letting the final transcription events come through. I will need to check this, as i had explicitly tested that this would not be the case.

For sure, this is my theory too. Thanks Luke!

One thing I hadn't accounted for was an end event on the data/microphone, which is an oversight on my part.

Oh okay. Is there an alternative mechanism you might suggest, or a recommended approach on how/when to call LiveClient.finish without depending on the source stream ending?

lukeocodes commented 7 months ago

fixed in v3.0.0-beta.4!