KaldiRecognizer doesn't decode quiet sounds

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.95k stars 1.1k forks source link

KaldiRecognizer doesn't decode quiet sounds #168

Open charlie-guan opened 4 years ago

charlie-guan commented 4 years ago

I have this audio file that I want to transcribe initial8691708938939979847.zip

VOSK transcribes the audio to just "oh." When I instantiated KaldiRecognizer, I set the sampling rate to the same sampling rate of the audio file, I'm not sure why it is not transcribing the sentence. Is it not loud enough? Google Cloud's speech-to-text transcribed the file properly, so I'm wondering why this issue happens on mobile.

I used Android's MediaRecorder to record the voice clip like this:

recorder = new MediaRecorder(); recorder.setAudioSource(MediaRecorder.AudioSource.VOICE_RECOGNITION); recorder.setOutputFormat(AudioFormat.ENCODING_PCM_16BIT); recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AAC); recorder.setAudioChannels(1); recorder.setAudioEncodingBitRate(128000); recorder.setAudioSamplingRate(48000);