Possible to include more languages? and add custom models like WhisperAI?

abb128 / LiveCaptions

Linux Desktop application that provides live captioning

GNU General Public License v3.0

1.26k stars 32 forks source link

Possible to include more languages? and add custom models like WhisperAI? #17

Open NayamAmarshe opened 1 year ago

NayamAmarshe commented 1 year ago

Would be amazing if we could do that, I wouldn't mind the latency to be honest, if it worked on other languages.

abb128 commented 1 year ago

I won't be adding Whisper support for now because it requires really powerful hardware and consumes too much system resources. My main PC can't even handle it so it would be a real challenge to implement and test it. Adding extra languages is definitely something I want to do at some point but it'd require training new models

kha84 commented 1 year ago

@abb128 probably you can consider whisper.cpp project, which uses the same trained models from OpenAI::Whisper but it's implemented in cpp and claims to run on a potato (rpi3/4)

abb128 commented 1 year ago

@kha84 That's the one I tried on my desktop but even with tiny model it consumed 100% of my CPU and didn't run fast enough for realtime. I believe by "run on a potato (rpi3/4)" maybe it's meant that it runs at all, not that it runs faster than realtime speeds, unless I did something horribly wrong. If it does indeed run realtime on pi then please let me know.

kha84 commented 1 year ago

@kha84 That's the one I tried on my desktop but even with tiny model it consumed 100% of my CPU and didn't run fast enough for realtime. I believe by "run on a potato (rpi3/4)" maybe it's meant that it runs at all, not that it runs faster than realtime speeds, unless I did something horribly wrong. If it does indeed run realtime on pi then please let me know.

Yeah sorry, I take my words back. I played with whisper.cpp a while, the aprilasr provides much more consistent results at a fraction of CPU cost.

abb128 commented 1 year ago

I did some more reading and found that it can indeed run in realtime on Pi 4 and on my computer as well by adjusting some parameters in the stream example program: https://github.com/ggerganov/whisper.cpp/discussions/166

So maybe this could be viable after all, but I do find the latency a bit lacking

petterreinholdtsen commented 1 year ago

What amount of training data is needed to add a new language? Would love to see support for Norwegian Bokmål (nb) and Norwegian Nynorsk (nn).