Optimized ASR models for weaker devices

maymage commented 1 year ago

I was wondering, if you could also publish a twin app for transcription with focus on less cpu strain to make transcription available on potato laptops too?

abb128 commented 1 year ago

I hope to make smaller and more optimized models at some point that can run on weaker devices such as Raspberry Pi

maymage commented 1 year ago

Ah.. that's cool! Better some LiveCaptions than none!!

But what about just an app for transcription, so that the processing time doesn't have to be quasi nil? Use case: dictating a letter, email, message for you favourite messenger client etc.

Kind of

====================================================================
| Clear | Clipboard | ----  App Title  ------| Mic | Hamburger |(X)|
====================================================================
|                                  |
|                                  |
|                                  |
|                                      |
|   Text Editor for Postprocessing perhaps with Markdown support   | 
|                                  |
|                                  |
|                                  |
|                                  |
|                                  |
====================================================================

I know, this is off topic. Related

https://gitlab.gnome.org/bertob/app-ideas/-/issues/124

Oh.. I saw you already changed the issue title for the potato case!

abb128 commented 1 year ago

I see. If you want just non-live transcription, I suggest trying out OpenAI Whisper because that will generally provide better results. But I know it may not run too well on slow laptops, maybe only the tiny.en model would run.

Voice dictation is an entire thing that requires careful design (commands such as "delete that", "capitalize (word)", "exclamation mark"). I'm not too sure if I'm prepared to tackle that at this time.

abb128 / LiveCaptions

Optimized ASR models for weaker devices #10