Closed royrogermcfreely closed 2 years ago
Hi,
i can select the new model but then when i press the mic i get "error: 'E0? - unknown' -> doing this on my phone did i miss something?
Could you post your [asr_models]
section of the config file please?
Did you see any errors in the terminal when you run the STT server?
how i can teach sepia to open an app from the client. i tried to config the "android intent/url" field but couldnt get it working
It depends a bit on your app. Do you have any broadcast intent listeners? Or maybe an URL scheme? Android 11 has become more restrictive regarding direct access to app activities, but these restrictions shouldn't apply to SEPIA v0.24.0 yet.
I haven't thoroughly tested this but if you have an URL scheme you can try this via platform controls service:
Android Intent: {"value": {"type": "androidActivity", "data": {"action": "android.intent.action.VIEW", "url":"myapp://example.com"} } }
[EDIT] Maybe add package, but this will most likely break in v0.24.1 anyway because the app has to respect Android 11 settings then :-/:
Android Intent: {"value": {"type": "androidActivity", "data": {"action": "android.intent.action.VIEW", "url":"myapp://example.com", "package": "com.vanced.android.youtube"} } }
Or if you have a broadcast intent listener (aka BroadcastReceiver) registered for let's say 'com.vanced.android.youtube.MY_ACTION' you can try:
Android Intent: {"value": {"type": "androidBroadcast", "data": {"action": "com.vanced.android.youtube.MY_ACTION", "extras": {"my_info": "my_text"} }
This is my asr config (home/sepia/sepia-stt/models/my/server.conf):
[asr_models]
base_folder=../models/
path1=vosk-model-small-de
lang1=de-DE
path2=vosk-model-small-en-us
lang2=en-US
path3=vosk-model-de
lang3=de-DE
i tried it also with the python way - same result. i can use the small models but not the big one.
and the command python -m launch
is not working, you have to use python3 -m launch
https://github.com/SEPIA-Framework/sepia-stt-server/blob/master/src/README.md
I think I messed something up in the readme when I changed the paths, can you try:
path3=my/vosk-model-de
lang3=de-DE
when i try "my/vosk-model-de" i can select it and after the wakeword is dedected the recording symbol is not ending and no words are dedected.
Can you give me the the specs again please: Which Docker container (Amd64, Aarch64, Armv7)? What hardware do you use for the STT server (platform, CPU, RAM)? What client do you use for testing? (Android App, Desktop browser, DIY?). If you are using the Android App can you try the Desktop browser client as well?
i run the amd64 docker container and the sepia server on a ubuntu vm (same machine) on proxmox with 4x3.5ghz and 8 gb ram
i use the android app (v24 and v23 cause on my tablet the recording is not working with v24 but thats something else) and the desktop browser (chrome with treat unsecure origin)
its allways the same. the default models work but the german big one not. i didnt try a diffrent model. maybe i can test it over the weekend with a diffrent english model
i wanna try the diy client to install over christmas.
I realized that I had the older 0.6 German large model and now after the update to 0.21 I can confirm that there is definitely something wrong :grimacing: I've tried to update Vosk from 0.3.30 to 0.3.32 but it didn't help :-/. Gotta check the code tomorrow and see if its a problem with the model, Vosk or Vosk interface :-/
Update: I realized that the model itself is actually working but painfully slow (1:40min instead of 8s transcription time compared to the older v0.6 on a 8GB RAM machine) and had a quick discussion with Nickolay from Vosk about it. He said that the large DE model v0.21 requires at least 16GB RAM due to the RNN language model. If you check the model you will see a folder called 'rnnlm', you can delete this folder to disable RNNLM. This will greatly enhance speed at cost of a slightly worse WER.
i saw your discussion and tried from there the 0.6 model. its pretty good, but most of the time it adds the word "einen" at the end of my sentences...
what is WER? for what do i need it?
most of the time it adds the word "einen" at the end of my sentences
yeah, I have the same issue :-/
what is WER? for what do i need it?
Word-error-rate ... its basically the accuracy of the model. Since it is calculated on test data it's often not very representative but still the best metric we currently have.
Depending on what you want to do it might be useful to train your own LM. I'm planning to build a SEPIA specific corpus soon >here< ... if you have some suggestions ;)
hey.
thanks i will look how to train my own LM
the v0.21 is not really working with 16gb ram.
but i will try it later again. meanwhile the v0.6 is enough for me
the v0.21 is not really working with 16gb ram
Did you try to remove the 'rnnlm' folder? The 0.21 might still be better than the 0.6 even without RNNLM rescoring ... this is just a wild guess though ^^. It still seems to be a bit slower.
i installed the docker stt server. i can reach it and the small vosk-de model is working. than i mapped the big vosk-de model, configured the server.conf and started the server with:
sudo docker run --rm --name=sepia-stt -p 20741:20741 -it \ -v /home/sepia/sepia-stt/models/my:/home/admin/sepia-stt/models/my \ --env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/models/my/server.conf \ sepia/stt-server:vosk_amd64
i can select the new model but then when i press the mic i get "error: 'E0? - unknown' -> doing this on my phone
did i miss something? i followed this instructions:
second: how i can teach sepia to open an app from the client. i tried to config the "android intent/url" field but couldnt get it working.
my app id is "'com.vanced.android.youtube"
/roy