[Cris.AI]: Custom Speech to text training

chinmayembedded commented 5 years ago

I am trying to add custom audio files with transcription for custom speech to text training.

When I add acoustic audio file with transcription it shows me following error. _Acoustic data import failed: Process ended with exit code 255 for command D:\batch\tasks\shared\princeton\16.1.0\wav-validator\validate-wav-scp.exe D:\Users_azbatchtask2069\AppData\Local\Temp\tmpz2tb3ig\audio_files_to_validate.scp D:\Users_azbatchtask2069\AppData\Local\Temp\tmpz2tb3ig\wav_validatoroutput.scp.

Speech service pricing tier - F0(free) Audio duration - 4 minutes Sample rate - 8000 Hz

Also attaching the input training data provided for acoustic model. https://drive.google.com/drive/folders/1TDK00P6e-TYtS3YrkBJ-vAopo8qmrGJP?usp=sharing

chlandsi commented 5 years ago

Hi @chinmayembedded, thanks for reporting the issue. Training a custom model is not supported under the free subscription. We are looking into whether there is any additional problem.

chinmayembedded commented 5 years ago

Hi @chlandsi I can train a model with language data on above configuration. I think, if training a custom model is not supported under free subscription, this as well shouldn't work.

mdoulaty commented 5 years ago

@chinmayembedded can you share the files differently? I tried accessing them using the link you shared, but I got a permission denied error.

chinmayembedded commented 5 years ago

@mdoulaty Can you check again? I made the link public.

chinmayembedded commented 5 years ago

Two of the issues i figured out about the audio files are as below

Audio duration is 60 sec for training
The audio file which I have added has sample encoding of 8 bit unsigned PCM whereas the acoustic model needs 16 bit PCM.

Please share the feedback.

mdoulaty commented 5 years ago

Yes, sample encoding of the audio file was not correct, "8-bit Unsigned Integer PCM" is not supported. Please check the format specs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-data#audio--human-labeled-transcript-data-for-testingtraining As you mentioned, the duration exceeded the maximum duration as well

chinmayembedded commented 5 years ago

Another query,

The transcription file which is used shows type as ASCII character encoding.

As per the document, Transcription should be encoded as UTF-8 byte order mark (BOM).

It it necessary to convert the transcription file to UTF-8 encoding?

The error as shown below-

_Number of success: 0 Number of failure: 0 Acoustic data import failed: Zero transcriptions could be parsed from the given input. Error: invalid input line format: 64_chunk1.wav for mission number one is esophageal cancer, status post chemotherapy, x-ray therapy, and esophagectomy with pull through gastric anastomosis. Next is severe protein-calorie malnutrition, on G-tube and oropharyngeal dysphagia. Next is vocal cord paralysis. Next is pneumonia slash pneumonitis Error: invalid input line format: 64_chunk4.wav in weeks. Next is due esophageal cancer in months to years. Next is consultants on the case are from Pulmonary Medicine. Next is The Physician from Cardiology. Next is Error: invalid input line format: 72_chunk4.wav Assessment number one poor dentition with fractured tooth. Next number probable dental caries. Next number history of mechanical heart valve. Next number history of depression. Next number chronic pain. Plan we will start empiric antibiotics._

mahilleb-msft commented 5 years ago

Can you double check that the offending lines contain exactly a single tab character following the filename?

chinmayembedded commented 5 years ago

Apparently, that's the issue with my editor. Got it resolved with gedit.

xtianus79 commented 4 years ago

Is there a good editor with a spell checker to do this. I went through the same issue? @mabasile-MSFT but thanks for the help

Azure-Samples / cognitive-services-speech-sdk

[Cris.AI]: Custom Speech to text training #405