alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.36k stars 1.04k forks source link

Vosk Versus Pico Voice - A quick comparison. #909

Open ls-milkyway opened 2 years ago

ls-milkyway commented 2 years ago

Vosk Vs Pico Voice (leopard)

Models Used: Vosk Model: En-US 0.22 Pico Voice Model: En-Inbuilt (needs access-key which can be obtained by logging in to Pico Voice Console).

Procedure: I wished to compare Vosk with another highly boasted ASR project called Picovoice (leopard) but this time I wanted to use a simple file with less audio complexity as compared to my earlier comparison of #892. Plus video is of short duration... i.e. trailer 5 of Batman 2022 containing better stereo audio in PCM format (2300kb/s at 48 KHZ).

Procedure was same as in #892 except spleeter was not used ....& audio file was simple (with no US slangs, bad words etc.) .......but it definitely consists of low & different voice pitches.

Results: Pico unprocessed

WER: 62.162% ( 161 / 259) WRR: 39.382% ( 102 / 259)

Pico processed

WER: 61.776% ( 160 / 259) WRR: 40.154% ( 104 / 259)

Vosk unprocessed

WER: 111.446% ( 185 / 166) WRR: 4.819% ( 8 / 166)

Vosk processed

WER: 62.348% ( 154 / 247) WRR: 37.652% ( 93 / 247)

SER (Sentence error rate was again 100% in both the cases).

Conclusion: Pico Voice does outperform Vosk in scores but other important factors to be considered:

1) Pico Voice only allows free usage of 360000 seconds per month...one needs access-key to be obtained online.Although the process seems to be offline but the key needs to be authenticated online. 2) Only english model is present. 3) Pico Voice also allows Speech-to-Text models with custom vocabularies you can add new words with custom pronunciations to fine tune the model (smart & practical way to increase efficiency). 4) Processing Vosk with spell check ...brings it's efficiency in par to Pico Voice.

Files:

Originals: Original trailer can be downloaded from the link described in procedure for your analysis. Original SRT obtained from youtube ...basic processing carried out using notepad++ (as in #892) 1) base.txt 2) pico.txt 3) vosk.txt

Processed (spell correction) 1) base.txt 2) pico.txt 3) vosk.txt

Enjoy!

base21 commented 2 years ago

Hi @ls-milkyway, which project was used for the spell correction?

ls-milkyway commented 2 years ago

Hi @ls-milkyway, which project was used for the spell correction?

Read https://github.com/alphacep/vosk-api/issues/892 ...it's mentioned there ...in fact there are many AI based spell correctors....try a new one to see if u get better results in post-processing.