Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.87k stars 1.85k forks source link

Incomplete audio (.mp3) output file with speakTextAsync #2609

Open thphuccoder opened 4 weeks ago

thphuccoder commented 4 weeks ago

Hello,

I'm using Javascript SDK to convert text (multiple short sentences, each API call for each sentence) to speech (.mp3 file). Here is the code:

import sdk, { SpeechSynthesisOutputFormat } from 'microsoft-cognitiveservices-speech-sdk';
import 'dotenv/config';
import path from 'path';
import { tmpdir } from 'node:os';

const subscriptionKey = process.env.AZURE_SUBSCRIPTION_KEY;
const serviceRegion = process.env.AZURE_REGION;

export async function convertTextToSpeech(text, outputFileName, voice) {
  const speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
  speechConfig.speechSynthesisOutputFormat = SpeechSynthesisOutputFormat.Audio24Khz160KBitRateMonoMp3;
  let voiceCode = voice?.get('code');
  if (!voiceCode) {
    voiceCode = 'en-US-AvaMultilingualNeural';
  }
  speechConfig.speechSynthesisVoiceName = voiceCode;
  const filePath = path.join(tmpdir(), outputFileName);
  const audioConfig = sdk.AudioConfig.fromAudioFileOutput(filePath);

  let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

  return new Promise((resolve, reject) => {
    synthesizer.speakTextAsync(
      text,
      result => {
        if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
          console.log(`Synthesis succeeded for sentence: ${text}`);
          resolve(filePath);
        } else {
          console.error(`Synthesis failed: ${result.errorDetails}`);
          reject(new Error(result.errorDetails));
        }
        synthesizer.close();
        synthesizer = null;
      },
      err => {
        console.error(`Error: ${err}`);
        synthesizer.close();
        synthesizer = null;
        reject(err);
      },
    );
  });
}

Then I call it from like below:

let filePath;
  try {
    // Each lesson has a default voice (defaultVoice under Lesson table).
    // Generate the audio using this default voice.
    const lesson = sentence.get('lesson');
    const voice = await getVoiceByLesson(lesson);

    filePath = await convertTextToSpeech(content, fileName, voice);
    // Code to upload the file to my server. I don't think it is related to the bug because the local files (before uploading) are cut off themselves
    ...
    console.log('Audio file successfully saved to the server.');
  } catch (e) {
    console.error('An error occurred:', e);
    throw e; // Re-throw the error after logging
  } finally {
    // Ensure the temporary file is deleted
    if (filePath) {
      try {
          await fs.unlink(filePath);
          console.log(`Temporary file ${filePath} deleted.`);
      } catch (unlinkErr) {
        console.error(`Failed to delete temporary file: ${unlinkErr}`);
      }
    }
  }

Note:

Expected behavior

The output files (.mp3 files) should not be cut off.

Version of the Cognitive Services Speech SDK

microsoft-cognitiveservices-speech-sdk": "^1.40.0

Platform, Operating System, and Programming Language

github-actions[bot] commented 1 week ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.