Open dylman123 opened 4 years ago
https://cloud.google.com/speech-to-text/docs/best-practices?hl=en_US
Google Speech to Text API seems very capable: https://www.youtube.com/watch?v=jOYzvq5dBrQ
May be possible to train your own STT model with IBM Watson... interesting: https://medium.com/ibm-watson/watson-speech-to-text-how-to-train-your-own-speech-dragon-part-1-data-collection-and-fdd8cea4f4b8
Maybe longer clips give more accurate results?
Rev API is apparently better than Google at Speech to Text?
Rev API also returns timestamp data :)
13 April 2020
My use case requires reasonable diarization accuracy. I am currently comparing between Google Cloud Speech API and Rev AI API.
In my sample audio clip, there are 2 speakers (1 male and 1 female). However the output from Rev AI only detected 1 speaker.
My audio file only has a single channel.
I have made sure to pass in the option:
skip_diarization = false
, which is the default value anyway. Referring to schema: https://www.rev.ai/docs#operation/SubmitTranscriptionJobIs this expected performance? Or am I doing something wrong?
17 April 2020
After some investigation by our engineering team we found that diarization failed on this file because of the short length and fast speaker switches. This is something we are actively trying to improve. Do you have more files like this? If so, would you be able to share them with us?
Note: the file in question has a duration of 43 seconds.
The import wizard should display optional user settings which then get sent to the transcription service for processing.
Try improve:
Things to try: