fishaudio / fish-speech

Brand new TTS solution
https://speech.fish.audio
Other
11.72k stars 889 forks source link

New language training - Thai [Feature] #207

Open TanZhili opened 4 months ago

TanZhili commented 4 months ago

I am training the model for a new language, Thai.

  1. Should I only modify fish_speech/text/clean.py to allow the unicodes for thai? Is there any other codes that I should modify?
  2. I use about 200 hours data to do fine-tuning on medium model/codebook and lora on large model/codebook. After thousands steps, both the models still cannot generate correct thai sounds. Should I train more steps?

Finetune logs: Epoch 0: | | 5267/? [16:46:54<00:00, 0.09it/s, v_num=2, train/loss=7.090, train/top_5_accuracy=0.209, val/loss=7.640, val/top_5_accuracy=0.172]

Lora logs: Epoch 0: | | 1242/? [16:41:25<00:00, 0.02it/s, v_num=5, train/loss=9.500, train/top_5_accuracy=0.154, val/loss=10.60, val/top_5_accuracy=0.115]

leng-yue commented 4 months ago

You only need to update the clean.py I believe. Did you observe overfitting during training? If not, you probably need to train for more steps.

TanZhili commented 4 months ago

You only need to update the clean.py I believe. Did you observe overfitting during training? If not, you probably need to train for more steps.

How many steps should be enough? Could you help to guess?

leng-yue commented 4 months ago

Ideally 100k, but I am not sure what will happen on a 200 hours dataset... We didn't explore pretraining with small dataset currently. I will suggest to set a lower learning rate, like 4e-5, 2e-5, enable weight decay, and enable dropout to counter overfitting.

maxbizz commented 4 months ago

@TanZhili Do let us know how many steps and what settings you set, if you successfully train your thai model.

yy524 commented 3 months ago

@TanZhili Do you successfully train your thai model?

github-actions[bot] commented 22 hours ago

This issue is stale because it has been open for 30 days with no activity.