FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
5.77k stars 617 forks source link

Other languages #202

Open cseti007 opened 3 months ago

cseti007 commented 3 months ago

Thanks for your great work! I'm just wondering how big dataset is recommended from training from scratch for other languages?

Thank you!

rlenain commented 3 months ago

I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal

aluminumbox commented 3 months ago

of course you can, check whisper tokenizer and add <|your language|> at sentence start

rlenain commented 3 months ago

@aluminumbox i'm getting a weird issue in spanish where proper nouns / uncommon words aren't being said properly - think it might be a tokenizer issue. do you have any idea how the BPE tokenizer would react to a new language and a reason why it would struggle with proper nouns / uncommon words?

aluminumbox commented 3 months ago

@aluminumbox i'm getting a weird issue in spanish where proper nouns / uncommon words aren't being said properly - think it might be a tokenizer issue. do you have any idea how the BPE tokenizer would react to a new language and a reason why it would struggle with proper nouns / uncommon words?

we use whisper tokenizer, check cosyvoice.yaml, we also do not have enough experience in spanish tokenization

drlor2k commented 1 month ago

hello @rlenain, are you training only llm model or also flow model? and how much GPU resources you use for Spanish training.

justinatbahasa commented 2 weeks ago

hi @aluminumbox , do you think it's better to train cosyvoice from scratch or just finetune the CosyVoice-300M base model if I want to train on new language? Also, should I train both llm and flow if I want to finetune it?