Closed francqz31 closed 10 months ago
i also know that coqui shut down , there is this really new tts model here https://github.com/PolyAI-LDN/pheme that claims to be really fast too , if both of this and parakeet got integrated into https://github.com/KoljaB/LocalAIVoiceChat i believe it will be a super boost better performance with faster speed , you can also apply some tricks to them to make them faster !!
The nvidia stt looks very promising. Word error rate better than whisper and if it's even faster it's for sure is a great candidate. Hope it does all languages well and not only english. I think currently it does not scale to low VRAM systems, Whisper offers tiny model...
pheme looks good, but tbh so do a lot of engines currently. For pure speed for example styletts2 is a really great engine. 6-7x faster than XTTS.
ok got it 👍 i just wanted to notify you , there is also a really new MIT licenced model that claims to be better than mistral 7B thus it mostly will be compatible with zypher! , it is only 2.7B so i bet it will be really fast https://huggingface.co/microsoft/phi-2 you might want to integrated into LocalAIVoiceChat for better speed while holding same accuracy!
now i will close the issue
@KoljaB Are you planing to try one of this? Would be awesome to receive faster results since even with Cuda i receive fullSentence event after 3-4 seconds for 3 sentences of text which is not ideal
there are new fast stt models from nvidia they claim to be better than whisper v3: on here https://huggingface.co/spaces/hf-audio/open_asr_leaderboard , some of them are really really fast with almost the same accuracy as large whsiper models aka the parakeet family even this one https://huggingface.co/spaces/nvidia/parakeet-rnnt-1.1b is way faster than whisper v3!