futo-org / voice-input

Issue tracker for Voice Input
86 stars 0 forks source link

Using non-English with a single language model #53

Open dktzde opened 5 months ago

dktzde commented 5 months ago

I would be really happy if I could use German (or any other language) with a single language model. I suppose this would be faster than the multilanguage model.

At the moment it's like this: as soon as I choose a language other than English (even if it's the only language I use), the multilanguage model is used.

This issue is somehow related to https://github.com/futo-org/voice-input/issues/15.

abb128 commented 5 months ago

Can you clarify what you mean exactly by single language model?

dktzde commented 5 months ago

If I use English as the language, I can use the English model.

For all other languages, a multilingual model is used. I would like to be able to use a German model for German (for which I used the term single-language model), which of course also affects other languages.

abb128 commented 5 months ago

The official pretrained Whisper checkpoints we use as a base only have two variants - multilingual (e.g. tiny, base, small) and english (e.g. tiny.en, base.en, small.en), there are no separate models for other languages. It is possible to finetune the multilingual model on a specific language to potentially be more accurate, and some people have done this but primarily with the large model which would be too large to run on a phone. We are working to allow importing arbitrary models to support smaller finetuned models though.