stt server with vosk-de big model + how to start app from client

royrogermcfreely commented 2 years ago

i installed the docker stt server. i can reach it and the small vosk-de model is working. than i mapped the big vosk-de model, configured the server.conf and started the server with:

sudo docker run --rm --name=sepia-stt -p 20741:20741 -it \ -v /home/sepia/sepia-stt/models/my:/home/admin/sepia-stt/models/my \ --env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/models/my/server.conf \ sepia/stt-server:vosk_amd64

i can select the new model but then when i press the mic i get "error: 'E0? - unknown' -> doing this on my phone

did i miss something? i followed this instructions:

To_ add new ASR models create a shared volume for your container, place your model inside and update the server config file. The "adapt" section below has a more detailed example, but basically you can: Add a volume to your container, e.g. use run flag: -v [host-models-folder]:/home/admin/sepia-stt/models/my (Note: use absolute path!) Copy your model folder (e.g. 'vosk-model-small-es') and the server config file to your new folder Add model path and language code to the "[asr_models]" section in your config, e.g.: path3=vosk-model-small-es and lang3=es-ES Tell the server to use your new config via the flag: --env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/models/my/server.conf

second: how i can teach sepia to open an app from the client. i tried to config the "android intent/url" field but couldnt get it working.

my app id is "'com.vanced.android.youtube"

/roy

fquirin commented 2 years ago

Hi,

i can select the new model but then when i press the mic i get "error: 'E0? - unknown' -> doing this on my phone did i miss something?

Could you post your [asr_models] section of the config file please? Did you see any errors in the terminal when you run the STT server?

how i can teach sepia to open an app from the client. i tried to config the "android intent/url" field but couldnt get it working

It depends a bit on your app. Do you have any broadcast intent listeners? Or maybe an URL scheme? Android 11 has become more restrictive regarding direct access to app activities, but these restrictions shouldn't apply to SEPIA v0.24.0 yet.

I haven't thoroughly tested this but if you have an URL scheme you can try this via platform controls service:

Android Intent: {"value": {"type": "androidActivity", "data": {"action": "android.intent.action.VIEW", "url":"myapp://example.com"} } }

[EDIT] Maybe add package, but this will most likely break in v0.24.1 anyway because the app has to respect Android 11 settings then :-/:

Android Intent: {"value": {"type": "androidActivity", "data": {"action": "android.intent.action.VIEW", "url":"myapp://example.com", "package": "com.vanced.android.youtube"} } }

Or if you have a broadcast intent listener (aka BroadcastReceiver) registered for let's say 'com.vanced.android.youtube.MY_ACTION' you can try:

Android Intent: {"value": {"type": "androidBroadcast", "data": {"action": "com.vanced.android.youtube.MY_ACTION", "extras": {"my_info": "my_text"} }

royrogermcfreely commented 2 years ago

This is my asr config (home/sepia/sepia-stt/models/my/server.conf):

[asr_models]
base_folder=../models/
path1=vosk-model-small-de
lang1=de-DE
path2=vosk-model-small-en-us
lang2=en-US
path3=vosk-model-de
lang3=de-DE

i tried it also with the python way - same result. i can use the small models but not the big one.

and the command python -m launch is not working, you have to use python3 -m launch

https://github.com/SEPIA-Framework/sepia-stt-server/blob/master/src/README.md

fquirin commented 2 years ago

I think I messed something up in the readme when I changed the paths, can you try:

path3=my/vosk-model-de
lang3=de-DE

royrogermcfreely commented 2 years ago

when i try "my/vosk-model-de" i can select it and after the wakeword is dedected the recording symbol is not ending and no words are dedected.

fquirin commented 2 years ago

Can you give me the the specs again please: Which Docker container (Amd64, Aarch64, Armv7)? What hardware do you use for the STT server (platform, CPU, RAM)? What client do you use for testing? (Android App, Desktop browser, DIY?). If you are using the Android App can you try the Desktop browser client as well?

royrogermcfreely commented 2 years ago

i run the amd64 docker container and the sepia server on a ubuntu vm (same machine) on proxmox with 4x3.5ghz and 8 gb ram

i use the android app (v24 and v23 cause on my tablet the recording is not working with v24 but thats something else) and the desktop browser (chrome with treat unsecure origin)

its allways the same. the default models work but the german big one not. i didnt try a diffrent model. maybe i can test it over the weekend with a diffrent english model

i wanna try the diy client to install over christmas.

fquirin commented 2 years ago

I realized that I had the older 0.6 German large model and now after the update to 0.21 I can confirm that there is definitely something wrong :grimacing: I've tried to update Vosk from 0.3.30 to 0.3.32 but it didn't help :-/. Gotta check the code tomorrow and see if its a problem with the model, Vosk or Vosk interface :-/

fquirin commented 2 years ago

Update: I realized that the model itself is actually working but painfully slow (1:40min instead of 8s transcription time compared to the older v0.6 on a 8GB RAM machine) and had a quick discussion with Nickolay from Vosk about it. He said that the large DE model v0.21 requires at least 16GB RAM due to the RNN language model. If you check the model you will see a folder called 'rnnlm', you can delete this folder to disable RNNLM. This will greatly enhance speed at cost of a slightly worse WER.

royrogermcfreely commented 2 years ago

i saw your discussion and tried from there the 0.6 model. its pretty good, but most of the time it adds the word "einen" at the end of my sentences...

what is WER? for what do i need it?

fquirin commented 2 years ago

most of the time it adds the word "einen" at the end of my sentences

yeah, I have the same issue :-/

what is WER? for what do i need it?

Word-error-rate ... its basically the accuracy of the model. Since it is calculated on test data it's often not very representative but still the best metric we currently have.

Depending on what you want to do it might be useful to train your own LM. I'm planning to build a SEPIA specific corpus soon >here< ... if you have some suggestions ;)

royrogermcfreely commented 2 years ago

hey.

thanks i will look how to train my own LM

the v0.21 is not really working with 16gb ram.

but i will try it later again. meanwhile the v0.6 is enough for me

fquirin commented 2 years ago

the v0.21 is not really working with 16gb ram

Did you try to remove the 'rnnlm' folder? The 0.21 might still be better than the 0.6 even without RNNLM rescoring ... this is just a wild guess though ^^. It still seems to be a bit slower.

SEPIA-Framework / sepia-docs

stt server with vosk-de big model + how to start app from client #139