Closed MuhammadRashid closed 5 years ago
Is there a role of accent? How can an accent be used like North American English Accent?
Accuracy tuning for mobile device is complicated and might require analysis of the data, training of the model, etc. Current model is pretty basic and optimized for realtime. More advanced models could be more accurate. Also, realtime conversations are hard to transcribe, much harder than broadcast or dictation.
Is it possible to use audio input without a microphone. Suppose we capture audio from other apps internally without microphone, how can we put this captured data (audio buffers) as input to Kaldi Android App to get transcribed info back?
Yes, it is demonstrated in the code, see
Is there a role of accent? How can an accent be used like North American English Accent?
Accuracy tuning for mobile device is complicated and might require analysis of the data, training of the model, etc. Current model is pretty basic and optimized for realtime. More advanced models could be more accurate. Also, realtime conversations are hard to transcribe, much harder than broadcast or dictation.
Is it possible to use audio input without a microphone. Suppose we capture audio from other apps internally without microphone, how can we put this captured data (audio buffers) as input to Kaldi Android App to get transcribed info back?
Yes, it is demonstrated in the code, see
Thank you very much for your kind response.
Yes, I found your referred code. By going through it again (although before I thought it only works for audio file that already kept in raw/asset folder).
KaldiRecognizer rec = new KaldiRecognizer(activityReference.get().model);
InputStream ais = ... // It can be from any audio source either from other apps like Youtube audio or from file on SD card/Gallery/ or inside app in assets/raw directory.
if (ais.skip(44) != 44) {
return "";
}
byte[] b = new byte[4096];
int nbytes;
while ((nbytes = ais.read(b)) >= 0) {
**rec.AcceptWaveform(b, nbytes);**
}
So it means we can give any Input stream buffer from any audio playing either inside current app or capturing from other apps (YouTube etc) silently without microphone by using android's PlaybackCapture APIs (android 10 support only) or other third party APIs, etc.
Kinldy confirm?
Correct, but for longer file processing the work should be a bit different, it should use the voice activity detection as python API:
while ((nbytes = ais.read(b)) >= 0) {
if (rec.AcceptWaveform(b, nbytes))
System.out.println(rec.Result())
else
System.out.println(rec.PartialResult());
}
System.out.println(rec.FinalResult());
Okay, thanks a lot.
Feel free to reopen
Feel free to reopen
I am stuck with a scenario. I need your help. I can get byte array of audio data continuously from playback. Each time I receive 1024 bytes. How can I pass this data continuosly to Kaldi Android Speech APIs in order to transcribe it. Suppose I am getting audio data inside app by using android Visiualizer class.
I can get byte array of audio data continuously from playback. Each time I receive 1024 bytes. How can I pass this data continuosly to Kaldi Android Speech APIs in order to transcribe it. Suppose I am getting audio data inside app by using android Visiualizer class.
in constructor or some other init:
Model model = new Model(assetDir.toString() + "/model-android");
KaldiRecognizer rec = new KaldiRecognizer(model);
when you recieve bytes
if(rec.AcceptWaveform(b, nbytes)) {
Log.d(TAG, rec.PartialResult());
} else {
Log.d(TAG, rec.Result());
}
Hi Nickolay, First I really much appreciate you for such a great effort.
I am using your demo app for continuous speech recognition with Microphone. I experienced about 80% accuracy on Audio to Text transcribe from a video/audio playing in android device or outside. However in-person conversation, it shows much poor accuracy. How can I overcome this problem?
There are few other questions. Can you please entertain them?
Kind regards, Muhammad Rashid