IBM / Train-Custom-Speech-Model

Create a custom Watson Speech to Text model using specialized domain data
https://developer.ibm.com/patterns/customize-and-continuously-train-your-own-watson-speech-service/
Apache License 2.0
59 stars 42 forks source link

Call recognize via WebSocket #55

Closed yhwang closed 5 years ago

yhwang commented 5 years ago

After client post the audio file to server side, server calls recognize API via WebSocket and get interim results and send the results back to client via WebSocket.

Signed-off-by: Yihong Wang yh.wang@ibm.com

yhwang commented 5 years ago

add audio play in the second push. It's handy when trying to transcribe a long audio file.

yhwang commented 5 years ago

@pvaneck actually @tedhtchang though that we can provide a pause/resume button somewhere and users can correct part of the transcripts right after hearing a specific sentence. When users modify the transcripts, they can click the resume button to play and transcribe the rest of the audio. On top of that, maybe a finish button to stop the audio play and render all transcripts.

But maybe it's too much for a code pattern.

yhwang commented 5 years ago

implemented the pause/resume/stop functions in the second force update. and user can pause the audio and edit the transcripts now.

rhagarty commented 5 years ago

@yhwang - I tested it and I really like the new interface. Hearing the transcription is a a great feature. BUT, I didn't see the fully transcribed text. I tried 249.wav and it was a good 5 min transcription, and I only got the following text - "XXX preoperative diagnoses correction chief complaint is abdominal pain history present illness patient is a very pleasant 14 year old patient who developed abdominal pain yesterday during the middle of the day."

Do I need any additional setup steps, other than npm install?

yhwang commented 5 years ago

@rhagarty I just fixed the linter errors and start to trace the issue you mentioned. You just need to do npm install and npm run dev. Let me try 249.wav and get back to you later.

yhwang commented 5 years ago

@rhagarty Thanks for helping the debugging. there are 2 issues:

for the second part, even I run the transcribe whiling language model is training, I still can get results. So I can only add code to handle the error path. but I can't verify it (yet).

rhagarty commented 5 years ago

@yhwang - just to confirm, my custom acoustic model training finally completed and I was able to see transcription text when using the model. I can add this potential problem scenario to the troubleshooting section of the readme.

yhwang commented 5 years ago

@rhagarty @pvaneck new findings!

I think this PR is in a good shape now!

rhagarty commented 5 years ago

@yhwang - I verified this works. Thanks for working through this issue.