Closed CaydenPierce closed 2 years ago
ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG.
It can be optimized. What are the available resources on ASG?
Hey @nshmyrev , thanks for checking this out.
The current ASG hardware is a Vuzix Blade with specs:
The current ASG hardware is a Vuzix Blade with specs:
Well, you definitely can run at least keyword activation on that. Something like https://github.com/ARM-software/ML-KWS-for-MCU should help and take very few resources. The rest depends on the app if you want to recognize just a few commands or more serious queries.
ASR accuracy/WER - this is used for live conversations in noisy environments - we were using Google because DeepSpeech just wasn't accurate enough - realize that Vosk is achieving significantly better performance with larger models, and hoping to use larger models on ASP (Android Smart Phone)
Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.
Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.
Great, thanks, we could certainly use some help in terms of getting highest possible accuracy/WER.
Since we will be streaming audio from ASG to ASP either way, it makes sense battery-wise and compute-wise to do ASR on ASP.
I've seen incredible results with the vosk-model-en-us-0.22
and good results with the vosk-model-small-en-us-0.15
. How reasonable would it to get the larger model (or something in between) with better accuracy going on a modern Android smart phone?
We're also looking into better sensors - the microphone used has a drastic effect on the WER. Have you tried different mics and found any that are ideal?
Happy to move this to an issue on Vosk repo. First priority is getting the whole pipeline working, but we'll soon want to optimize.
Thanks @nshmyrev
Tested both vosk-model-en-us-0.22
and vosk-model-en-us-0.22-lgraph
from https://alphacephei.com/vosk/models on Android, and both work! This is on my private fork which will be merged in the next few days.
The larger model vosk-model-en-us-0.22
wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22
though, so will want to make this something to download separate from the APK.
@nshmyrev
The larger model vosk-model-en-us-0.22 wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22 though, so will want to make this something to download separate from the APK.
Big model is certainly not for Android. The lgraph version should be ok.
Successful for all steps. Vosk is very high quality ASR, even with the small model.
A few things we need and will follow up with in future issues:
Part of the move away from streaming sensor data over the internet.
Relies on implementation of #10
ASR on Android
We need to be able to transcribe text locally. ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG. Streaming audio 8 hours a day every day to the internet takes too much data. This means it must happen on the ASP.
After considerable research by yours truly it seems that the best option to do this in Android is Vosk: https://github.com/alphacep/vosk-api
TODO (after completion of #10)