deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
156 stars 54 forks source link

Sending MP4 files with createReadStream fails if MP4 file has metadata header at end of file #153

Closed lox closed 1 year ago

lox commented 1 year ago

What is the current behavior?

Calling deepgram.transcription.preRecorded with a ReadStream fails with MP4 files with metadata headers at the end of the file, but only when streamed from a remote source (google storage) vs local filesystem.

I get:

DG: {"err_code":"Bad Request","err_msg":"Bad Request: failed to process audio: corrupt or unsupported data","request_id":"30a56692-f105-4529-a723-9bad8462de0b"}

Steps to reproduce

import fs from 'fs'
import deepgramSdk from '@deepgram/sdk'
import { Storage } from '@google-cloud/storage'

const Deepgram = deepgramSdk.Deepgram // Handle CommonJS module

async function transcribe() {
  const deepgram = new Deepgram(process.env.DEEPGRAM_APIKEY)

  const storage = new Storage() // Assumes application default credentials
  const bucketName = 'your-bucket-name'
  const fileName = 'your-file-name.mp4'

  const file = storage.bucket(bucketName).file(fileName)
  const stream = file.createReadStream()

  const response = await deepgram.transcription.preRecorded(
    {
      stream: stream,
      mimetype: 'audio/mpeg',
    },
    {
      utterances: true,
      utt_split: 1.0,
      diarize: true,
      punctuate: true,
      model: 'nova',
      language: 'en-AU',
    }
  )

  console.log(response)
}

transcribe().catch(console.error)

Expected behavior

It should work. This isn't actually being streamed, the API endpoint should only start processing when it gets the whole stream.

Please tell us about your environment

We want to make sure the problem isn't specific to your operating system or programming language.

Other information

Anything else we should know? (e.g. detailed explanation, stack-traces, related issues, suggestions how to fix, links for us to have context, eg. stack overflow, codepen, etc)

lox commented 1 year ago

Happy to provide example files if that helps!

lukeocodes commented 1 year ago

Thanks Lox. Just to check, does file.createReadStream() return a stream object or a promise? You may have to await file.createReadStream()

lox commented 1 year ago

It's a stream, as requested by the SDK. It would be up to the SDK to call await, right?

This works fine with webm files, or m4a files with the metadata at the start, so it's absolutely working in some cases.

lukeocodes commented 1 year ago

It's a stream, as requested by the SDK. It would be up to the SDK to call await, right?

Yeh we just need a Readable. I needed to ask because I've never used the gcloud node SDK.

If you want to share a link to the audio here you can, but if you'd rather keep it private you can share to me in our Discord https://dpgr.am/discord

lox commented 1 year ago

Yeah, apologies it's a very reasonable question! There must be something about the gcloud implementation that is causing the issue, I thought I'd raise it in case there is something in how the stream is handled on the deepgram side.

lukeocodes commented 1 year ago

Yeah, apologies it's a very reasonable question! There must be something about the gcloud implementation that is causing the issue, I thought I'd raise it in case there is something in how the stream is handled on the deepgram side.

Yeh indeed, and happy to try and help. If you could share the file, i'll see what I can recommend. It seems that the metadata at the end of file has been known to be an issue before. I've asked for a general way to normalize this in the meantime, i'll let you know what I hear.

lukeocodes commented 1 year ago

Closing this issue as it is likely an issue with our API and not the SDK.

We're interested in helping resolve it. So if you're still trying to resolve, please email the file on through to devrel@deepgram.com