MultiLanguage and Single Language Train Cross Lingual

osmankrblt commented 1 month ago

Hi. I am doing train LLM Model with Turkish. I have a question. How LLM model know what Language dataset I have. I have text file in my converted dataset folder by train stages request. Should I add <|language|> code every start of text or edit cosyvoice.yaml whisper tokenizer language?

If I have to edit cosyvoice.yaml this mean I can one language train every train. But I want to add many language in model.So I want to train many language in one train. What should I do. For example this is my text folder in dataset.

uuid1 textexttext uuid2 textexttext2

this is my text folder. Should I like below

uuid1 <|tr|> textexttext uuid2 <|tr|> textexttext2

Can I like below If I can like above

uuid1 <|tr|> textexttext uuid2 <|it|> textexttext2 uuid3 <|fr|> textexttext3 uuid4 <|de|> textexttext4

aluminumbox commented 1 month ago

both are ok

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 30 days with no activity.

FunAudioLLM / CosyVoice

MultiLanguage and Single Language Train Cross Lingual #483