Open charlie-guan opened 4 years ago
The file is not wav pcm, it has MP4 format:
file initial8691708938939979847.wav
initial8691708938939979847.wav: ISO Media, MP4 v2 [ISO 14496-14]
Is VOSK only able to parse audio in wav pcm format?
Yes, you have to convert audio with other library like ffmpeg before submitting it to recognizer.
Thanks for the tip! I used ffmpeg to convert audio to .wav, but for some reason the transcription still turns out nothing. Did I do something wrong in how I loaded the recognizer? Here's the lines
rec = new KaldiRecognizer(model, 16000.f); InputStream ais = new FileInputStream(outputFile);
outputFile
is the directory path to the .wav file I want to transcribe. Here's the audio file too if that helps.
converted.zip
We depend on the audio level somewhat. In your file the level is too low. You can probably normalize the volume before processing if that is your case to process such quiet files.
Like you can do sox converted.wav converted1.wav vol 5.0
and try converted1.wav
Thank you so much! It worked
Lets keep it open for now
I have this audio file that I want to transcribe initial8691708938939979847.zip
VOSK transcribes the audio to just "oh." When I instantiated KaldiRecognizer, I set the sampling rate to the same sampling rate of the audio file, I'm not sure why it is not transcribing the sentence. Is it not loud enough? Google Cloud's speech-to-text transcribed the file properly, so I'm wondering why this issue happens on mobile.
I used Android's MediaRecorder to record the voice clip like this:
recorder = new MediaRecorder(); recorder.setAudioSource(MediaRecorder.AudioSource.VOICE_RECOGNITION); recorder.setOutputFormat(AudioFormat.ENCODING_PCM_16BIT); recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AAC); recorder.setAudioChannels(1); recorder.setAudioEncodingBitRate(128000); recorder.setAudioSamplingRate(48000);