How long can audio be and Acronyms utterance

MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure

https://docs.microsoft.com/azure

Creative Commons Attribution 4.0 International

10.29k stars 21.47k forks source link

How long can audio be and Acronyms utterance #28642

Closed wilsonchua20 closed 5 years ago

wilsonchua20 commented 5 years ago

should i do custom speech for rest api, because when i speak acronyms, the converted text tend to spell it out (e.g. BDO -> be dio) and which recognition mode (interactive, conversation) can audio file be more than 15seconds?

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 11673bc9-9062-278d-c5b8-84b05f2b81be
Version Independent ID: 29347a63-de63-4ff0-5a62-149ab31d1b7e
Content: Speech-to-text API reference (REST) - Speech Services - Azure Cognitive Services
Content Source: articles/cognitive-services/Speech-Service/rest-speech-to-text.md
Service: cognitive-services
Sub-service: speech-service
GitHub Login: @erhopf
Microsoft Alias: erhopf

wilsonchua20 commented 5 years ago

Should i upload acoustic data set for acronyms or pronunciation data set would suffice? thank you

RohitMungi-MSFT commented 5 years ago

@wilsonchua20 Thanks for the feedback. We are investigating into the issue and will update you shortly.

wilsonchua20 commented 5 years ago

Hi, may i know why when uploading acoustic data set, the status is failed?

RohitMungi-MSFT commented 5 years ago

@wilsonchua20 If you are using REST the audio for the first 15 seconds is recognized and the text is returned. I have tested the same using a audio file which was more than 1 minute and it returned only the first 15 seconds text. If you plan to use longer audio files please use any of the SDKs.

You can try mode 'dictation' and add enable custom pronunciation if your use case is of using acronyms.

Could you please elaborate on the error seen while import of acoustic data? Could you please check if the audio file and transcripts are according to the guidelines requested?

wilsonchua20 commented 5 years ago

Thank you for that. Should i also import for acoustic data set, even if i already imported the words in language data (i just pronounce the words in acoustic data set since they are not all in english)?

RohitMungi-MSFT commented 5 years ago

For language model, acoustic data set is not required but you need to add language data first.

RohitMungi-MSFT commented 5 years ago

@wilsonchua20 If you do not have any other queries we will proceed to close this thread. If there are further questions regarding this matter, please tag @RohitMungi-MSFT in your reply. We will gladly continue the discussion and we will reopen the issue.