alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.95k stars 1.1k forks source link

KaldiRecognizer doesn't decode quiet sounds #168

Open charlie-guan opened 4 years ago

charlie-guan commented 4 years ago

I have this audio file that I want to transcribe initial8691708938939979847.zip

VOSK transcribes the audio to just "oh." When I instantiated KaldiRecognizer, I set the sampling rate to the same sampling rate of the audio file, I'm not sure why it is not transcribing the sentence. Is it not loud enough? Google Cloud's speech-to-text transcribed the file properly, so I'm wondering why this issue happens on mobile.

I used Android's MediaRecorder to record the voice clip like this:

recorder = new MediaRecorder(); recorder.setAudioSource(MediaRecorder.AudioSource.VOICE_RECOGNITION); recorder.setOutputFormat(AudioFormat.ENCODING_PCM_16BIT); recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AAC); recorder.setAudioChannels(1); recorder.setAudioEncodingBitRate(128000); recorder.setAudioSamplingRate(48000);

nshmyrev commented 4 years ago

The file is not wav pcm, it has MP4 format:

file initial8691708938939979847.wav 
initial8691708938939979847.wav: ISO Media, MP4 v2 [ISO 14496-14]
charlie-guan commented 4 years ago

Is VOSK only able to parse audio in wav pcm format?

nshmyrev commented 4 years ago

Yes, you have to convert audio with other library like ffmpeg before submitting it to recognizer.

charlie-guan commented 4 years ago

Thanks for the tip! I used ffmpeg to convert audio to .wav, but for some reason the transcription still turns out nothing. Did I do something wrong in how I loaded the recognizer? Here's the lines

rec = new KaldiRecognizer(model, 16000.f); InputStream ais = new FileInputStream(outputFile);

outputFile is the directory path to the .wav file I want to transcribe. Here's the audio file too if that helps. converted.zip

nshmyrev commented 4 years ago

We depend on the audio level somewhat. In your file the level is too low. You can probably normalize the volume before processing if that is your case to process such quiet files.

nshmyrev commented 4 years ago

Like you can do sox converted.wav converted1.wav vol 5.0 and try converted1.wav

charlie-guan commented 4 years ago

Thank you so much! It worked

nshmyrev commented 4 years ago

Lets keep it open for now