JuliaNeuralGraphics / Whisper.jl

MIT License
19 stars 1 forks source link

Multilingual support #1

Closed Benoit9 closed 5 months ago

Benoit9 commented 5 months ago

Nice work! What is needed to support multilingual model? I might give it a try.

pxl-th commented 5 months ago

Hi! If I recall correctly, we just need support for multilingual tokenizer. You can look at how original multilingual tokenizer works and how it is used then.

I think once encoding/decoding works for multiple languages it should be enough to just load the respective model.

Benoit9 commented 5 months ago

Thanks! Indeed, there wasn't much to do. Only handle the last token of the multilingual token vocab, which the Base64 didn't like. I also added support for the new large-v3 model.

2