aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.03k stars 569 forks source link

Transcribe erro Your stream is too big while using RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz #6338

Closed davibaldin closed 1 month ago

davibaldin commented 1 month ago

Checkboxes for prior research

Describe the bug

Hi team,

The bellow code is working fine for OGG file format, however, while processing WAV file, I'm getting: Your stream is too big even for highWaterMark bellow to 1024 (4, 8, 16, ...).

async transcribeFile(filename: string, language: LanguageCode, sampleRate: number, encoding: MediaEncoding): Promise<string> {

      try {

        const audioSource = createReadStream(filename);
        const audioPayloadStream = new PassThrough({ highWaterMark: 1024 });
        audioSource.pipe(audioPayloadStream);

        const audioStream = async function* () {
          for await (const payloadChunk of audioPayloadStream) {
            yield { AudioEvent: { AudioChunk: payloadChunk } };
          }
        };

        const command = new StartStreamTranscriptionCommand({
            LanguageCode: language,
            MediaEncoding: encoding,
            MediaSampleRateHertz: sampleRate,
            AudioStream: audioStream(),
        });

        const response = await this.createTranscribeClient().send(command);

        let transcript = "";

        for await (const event of response?.TranscriptResultStream!) {
          if (event.TranscriptEvent) {
            const message = event.TranscriptEvent;
            const results = event?.TranscriptEvent?.Transcript?.Results;
            results!.map((result) => {
              (result.Alternatives || []).map((alternative) => {
                transcript = alternative.Items!.map((item) => item.Content).join(" ");
              });
            });
          }
        }

        return transcript.trim();

      }catch(e:any) {
        throw e;
      }
    }
sample3.wav:     RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
sample.ogg: Ogg data, Opus audio, version 0.1, mono, 16000 Hz (Input Sample Rate)

Exception is:

BadRequestException: Your stream is too big. Reduce the frame size and try your request again.
    at de_BadRequestExceptionRes (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:715:21)
    at de_BadRequestException_event (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:928:10)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:894:30
    at async Object.deserializer (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@smithy/eventstream-serde-universal/dist-cjs/index.js:127:37)
    at async _SmithyMessageDecoderStream.asyncIterator (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@smithy/eventstream-codec/dist-cjs/index.js:430:28)
    at async Transcribe.transcribeFile (/Users/davi/Development/Workspaces/anext/llm/twilio-service/src/transcribe.ts:64:26)
    at async Transcribe.transcribe (/Users/davi/Development/Workspaces/anext/llm/twilio-service/src/transcribe.ts:111:25)
    at async /Users/davi/Development/Workspaces/anext/llm/twilio-service/src/main.ts:42:14 {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: undefined,
    requestId: undefined,
    extendedRequestId: undefined,
    cfId: undefined
  }
}

SDK version number

"@aws-sdk/client-transcribe-streaming": "^3.620.0"

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.16.0

Reproduction Steps

Provided sample code.

Observed Behavior

BadRequestException: Your stream is too big. Reduce the frame size and try your request again.
    at de_BadRequestExceptionRes (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:715:21)
    at de_BadRequestException_event (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:928:10)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:894:30
    at async Object.deserializer (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@smithy/eventstream-serde-universal/dist-cjs/index.js:127:37)
    at async _SmithyMessageDecoderStream.asyncIterator (/Users/davi/Development/Workspaces/anext/llm/twilio-service/node_modules/@smithy/eventstream-codec/dist-cjs/index.js:430:28)
    at async Transcribe.transcribeFile (/Users/davi/Development/Workspaces/anext/llm/twilio-service/src/transcribe.ts:64:26)
    at async Transcribe.transcribe (/Users/davi/Development/Workspaces/anext/llm/twilio-service/src/transcribe.ts:111:25)
    at async /Users/davi/Development/Workspaces/anext/llm/twilio-service/src/main.ts:42:14 {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: undefined,
    requestId: undefined,
    extendedRequestId: undefined,
    cfId: undefined
  }
}

Expected Behavior

Stream Transcript working.

Possible Solution

No response

Additional Information/Context

No response

zshzbh commented 1 month ago

Hey @davibaldin ,

Thanks for contacting us and sorry for this experience.

I can reproduce this error. My reproduction steps:

  1. create a node project
  2. install the dependency @aws-sdk/client-transcribe-streaming
  3. write the following code into index.ts:
    
    import {TranscribeStreamingClient, StartStreamTranscriptionCommand, LanguageCode, MediaEncoding} from "@aws-sdk/client-transcribe-streaming";
    const { PassThrough } = require("stream");
    const { createReadStream } = require("fs");

const transcribeFile = async(filename: string, language: LanguageCode, sampleRate: number, encoding: MediaEncoding) => {

try {

const client = new TranscribeStreamingClient({region: "us-west-2",});
  const audioSource = createReadStream(filename);
  const audioPayloadStream = new PassThrough({ highWaterMark: 1*1024 });
  audioSource.pipe(audioPayloadStream);

  const audioStream = async function* () {
    for await (const payloadChunk of audioPayloadStream) {
      yield { AudioEvent: { AudioChunk: payloadChunk } };
    }
  };

  const command = new StartStreamTranscriptionCommand({
      LanguageCode: language,
      MediaEncoding: encoding,
      MediaSampleRateHertz: sampleRate,
      AudioStream: audioStream(),
  });

  const response = await client.send(command);
  console.log("result of client command: ", response.TranscriptResultStream)
  let transcript;

  for await (const event of response?.TranscriptResultStream!) {
    if (event.TranscriptEvent) {
      const message = event.TranscriptEvent;
      const results = event?.TranscriptEvent?.Transcript?.Results;
      results!.map((result) => {
        (result.Alternatives || []).map((alternative) => {
          transcript = alternative.Items!.map((item) => item.Content).join(" ");
        });
      });
    }
  }

  return transcript;

}catch(e:any) {
  throw e;
}

}

transcribeFile("./taunt.wav", LanguageCode.EN_US, 8001, MediaEncoding.PCM).then((result) => {console.log("result of transcribe file: ", result)})


The taunt.wav file is 3 s long. 

4. run the file using `ts-node index.ts` and got the error message:

result of client command: SmithyMessageDecoderStream { options: { messageStream: MessageDecoderStream { options: [Object] }, deserializer: [AsyncFunction (anonymous)] } } /Users/zshzbh/Desktop/newProj/6338/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:715 const exception = new BadRequestException({ ^ BadRequestException: Your stream is too big. Reduce the frame size and try your request again. at de_BadRequestExceptionRes (/Users/zshzbh/Desktop/newProj/6338/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:715:21) at de_BadRequestException_event (/Users/zshzbh/Desktop/newProj/6338/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:928:10) at processTicksAndRejections (node:internal/process/task_queues:95:5) at async /Users/zshzbh/Desktop/newProj/6338/node_modules/@aws-sdk/client-transcribe-streaming/dist-cjs/index.js:894:30 at async Object.deserializer (/Users/zshzbh/Desktop/newProj/6338/node_modules/@smithy/eventstream-serde-universal/dist-cjs/index.js:127:37) at async _SmithyMessageDecoderStream.asyncIterator (/Users/zshzbh/Desktop/newProj/6338/node_modules/@smithy/eventstream-codec/dist-cjs/index.js:430:28) at async transcribeFile (/Users/zshzbh/Desktop/newProj/6338/index.ts:33:21) { '$fault': 'client', '$metadata': { httpStatusCode: undefined, requestId: undefined, extendedRequestId: undefined, cfId: undefined }


 I also tried to modify `highWaterMark` to smaller numbers but the error still exists.

I reported this issue to service team and I will let you know if we have any updates! 

Thanks!
Maggie
zshzbh commented 1 month ago

Hey @davibaldin ,

I got updates from the service team. It seems that your chunk size in the audioStream is too big.

      const audioStream = async function* () {
          for await (const payloadChunk of audioPayloadStream) {
            yield { AudioEvent: { AudioChunk: payloadChunk } };
          }
        };

Please refer to the best practices guide and optimize the chunk size.

To address this, I have provided a working JavaScript code snippet that you can refer to. This code demonstrates how to handle the chunk size according to the recommended best practices. Please review the code and adapt it to your specific use case. The audioStream function now provides a good chunk size.

import {
  TranscribeStreamingClient,
  StartStreamTranscriptionCommand,
  LanguageCode,
  MediaEncoding,
} from "@aws-sdk/client-transcribe-streaming";
import { PassThrough } from "stream";
import { createReadStream } from "fs";

const client = new TranscribeStreamingClient({ region: "us-west-2" });
const audioSource = createReadStream("./audio.wav");
const audioPayloadStream = new PassThrough({ highWaterMark: 1 * 1024 });
audioSource.pipe(audioPayloadStream);

const sampleRate = 8000; // sample rate of the audio stream
const chunkSize = (2 * sampleRate * 100) / 1000; // this is 100 ms https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html#best-practices
const audioStream = async function* () {
  for await (const payloadChunk of audioPayloadStream) {
    let total_bytes_sent = 0;
    //chunk the audio to the given chunkSize
    if (payloadChunk.byteLength > chunkSize) {
      const result = [];
      const len = payloadChunk.length;
      let i = 0;

      while (i < len) {
        result.push(payloadChunk.slice(i, (i += chunkSize)));
      }

      let stream_start_time = Date.now();
      let wall_clock_time = 0;
      let audio_sent_time = 0;
      for (const chunk of result) {
        wall_clock_time = (Date.now() - stream_start_time) / 1000;
        total_bytes_sent += chunk.byteLength;
        audio_sent_time += chunk.byteLength / (2 * sampleRate);
        yield { AudioEvent: { AudioChunk: chunk } };
      }
    } else {
      wall_clock_time = (Date.now() - stream_start_time) / 1000;
      total_bytes_sent += chunk.byteLength;
      audio_sent_time += chunk.byteLength / (2 * sampleRate);
      yield { AudioEvent: { AudioChunk: payloadChunk } };
    }
    // console.log("total_bytes_sent: ", total_bytes_sent);

    await new Promise((r) =>
      setTimeout(r, total_bytes_sent / (2 * (sampleRate / 1000)))
    );
  }
};

const command = new StartStreamTranscriptionCommand({
  LanguageCode: LanguageCode.EN_US,
  MediaEncoding: MediaEncoding.PCM,
  MediaSampleRateHertz: sampleRate,
  AudioStream: audioStream(),
});

const response = await client.send(command);
console.log("result of client command: ", response.TranscriptResultStream);

let transcript;

for await (const event of response.TranscriptResultStream) {
  if (event.TranscriptEvent) {
    const message = event.TranscriptEvent;
    const results = event?.TranscriptEvent?.Transcript?.Results;
    //console.log("results", results);
    results.map((result) => {
      (result.Alternatives || []).map((alternative) => {
        alternative.Items.map((item) => console.log("item content",item.Content));
        transcript = alternative.Items.map((item) => item.Content).join(" ");
      });
    });
  }
}
console.log("transcript", transcript);

Please let me know if you have any questions!

Thanks! Maggie

github-actions[bot] commented 3 weeks ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.