Open ls-milkyway opened 2 years ago
Hi @ls-milkyway, which project was used for the spell correction?
Hi @ls-milkyway, which project was used for the spell correction?
Read https://github.com/alphacep/vosk-api/issues/892 ...it's mentioned there ...in fact there are many AI based spell correctors....try a new one to see if u get better results in post-processing.
Vosk Vs Pico Voice (leopard)
Models Used: Vosk Model: En-US 0.22 Pico Voice Model: En-Inbuilt (needs access-key which can be obtained by logging in to Pico Voice Console).
Procedure: I wished to compare Vosk with another highly boasted ASR project called Picovoice (leopard) but this time I wanted to use a simple file with less audio complexity as compared to my earlier comparison of #892. Plus video is of short duration... i.e. trailer 5 of Batman 2022 containing better stereo audio in PCM format (2300kb/s at 48 KHZ).
Procedure was same as in #892 except spleeter was not used ....& audio file was simple (with no US slangs, bad words etc.) .......but it definitely consists of low & different voice pitches.
Results: Pico unprocessed
WER: 62.162% ( 161 / 259) WRR: 39.382% ( 102 / 259)
Pico processed
WER: 61.776% ( 160 / 259) WRR: 40.154% ( 104 / 259)
Vosk unprocessed
WER: 111.446% ( 185 / 166) WRR: 4.819% ( 8 / 166)
Vosk processed
WER: 62.348% ( 154 / 247) WRR: 37.652% ( 93 / 247)
SER (Sentence error rate was again 100% in both the cases).
Conclusion: Pico Voice does outperform Vosk in scores but other important factors to be considered:
1) Pico Voice only allows free usage of 360000 seconds per month...one needs access-key to be obtained online.Although the process seems to be offline but the key needs to be authenticated online. 2) Only english model is present. 3) Pico Voice also allows Speech-to-Text models with custom vocabularies you can add new words with custom pronunciations to fine tune the model (smart & practical way to increase efficiency). 4) Processing Vosk with spell check ...brings it's efficiency in par to Pico Voice.
Files:
Originals: Original trailer can be downloaded from the link described in procedure for your analysis. Original SRT obtained from youtube ...basic processing carried out using notepad++ (as in #892) 1) base.txt 2) pico.txt 3) vosk.txt
Processed (spell correction) 1) base.txt 2) pico.txt 3) vosk.txt
Enjoy!