vosk-model-en-us-0.22-lgraph and vosk-model-en-us-0.22 model both are not performing well

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.36k stars 1.04k forks source link

vosk-model-en-us-0.22-lgraph and vosk-model-en-us-0.22 model both are not performing well #1508

Closed HakaishinShwet closed 5 months ago

HakaishinShwet commented 5 months ago

getting random text generation for audio file and music file. i mean it is not even generating and getting stuck in between result generation too.Compared to openai whisper it is performing in worst possible way

nshmyrev commented 5 months ago

Share the file

nshmyrev commented 5 months ago

And the way you run the decoding

HakaishinShwet commented 5 months ago

20240127_17h25m02s_grim 20240127_17h32m18s_grim 20240129_12h54m41s_grim 20240129_12h54m47s_grim 20240129_12h54m59s_grim 20240129_12h55m07s_grim 20240129_12h55m29s_grim

HakaishinShwet commented 5 months ago

i have attached both whisper model result in first two images and after that you can see how vosk model process the music file and generate the file in which nothing useful is there you can in last image, plus in cli too it is not detecting and showing perfect lines of music at all and i tried with different ones too and got same result @nshmyrev

HakaishinShwet commented 5 months ago

if you wanna test yourself you can download linkin park music from net and test for yourself or any other file you can test and tell if i am doing something wrong because i got this command from documentation and guides

nshmyrev commented 5 months ago

Ok, that kind of task is beyond our current capabilities for now. For the future we might look into it

https://www.youtube.com/watch?v=1bQXcxZeits

HakaishinShwet commented 5 months ago

@nshmyrev ok, hope it atleast reach to open ai whisper model level because their small models are also working very well even on my potato laptop haha so i was expecting not that level but still i was expecting some decent lines generation but it completely failed so i thought like am i doing something wrong or the project development is stopped or is not capable enough yet