Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.9k stars 1.85k forks source link

Saving audio to file #355

Closed jeffhee closed 5 years ago

jeffhee commented 5 years ago

Hello, I've wrapped the C# SDK sample code for Text-to-speech in a Web Api service proxy method. My goal is to convert text to an audio file and either save the AudioData to a file or return the AudioData directly.

In the code below, it seems like I can return result.AudioData with a corresponding MIME Type and file name. However, the combinations I've tried lead to unreadable files. I've spent the afternoon researching and reaching for the right combination without success.

Can you suggest some best practices or point me to an example?

Thank you!


var config = SpeechConfig.FromSubscription("XXXXXXXXXXXXXX", "westus2");
                config.OutputFormat = OutputFormat.Simple;
                // Creates a speech synthesizer using the default speaker as audio output.
                using (var synthesizer = new SpeechSynthesizer(config))
                {
                    var text = textToSynthesize;
                    using (var result = await synthesizer.SpeakTextAsync(text))
                    {
                        if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                        {
                            Console.WriteLine($"Speech synthesized to speaker for text [{text}]");

                            return File(result.AudioData, "***application/octet-stream***", "***audio.***");
                        }
                        else if (result.Reason == ResultReason.Canceled)
                        {
                            var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                            Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                            if (cancellation.Reason == CancellationReason.Error)
                            {
                                Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                                Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                                Console.WriteLine($"CANCELED: Did you update the subscription info?");
                            }
                        }
                    }
                }
rhurey commented 5 years ago

How are you trying to use the file?

The audio returned is a raw stream, in order to write it to a file (wav as an example) you'll need to write the correct wave header before writing the raw data.

You could also use the AudioDataStream.SaveToWaveFileAsync method to handle this for you.

See SynthesisToAudioDataStreamAsync in https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_samples.cs

jeffhee commented 5 years ago

@rhurey, thanks for the link!

Ideally, I'd like my endpoint to be a proxy to the caller: a react app posts text data; proxy web service submits to Speech API through SDK; returns audio data back to react app; react app plays sound.

EDIT: What is the correct wave header to use before writing the raw data?

yulin-li commented 5 years ago

@jeffhee

Thanks for pointing this issue. In the SDK version 1.6 and earlier, there is a bug where the audio data in synthesis result has no header. And we fixed in version 1.7. You can update your SDK version and have another try.

I have two tips, 1) you should use config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3) to set output format. The config.OutputFormat is for recoginition result format. 2) If you just want to synthesize a audio to file, maybe AudioDataStream.SaveToWaveFileAsync is a better choice. See sample here