How to learn new datasets additionally with my fine-tuned model?

airar-dev commented 2 weeks ago

Checks

[X] This template is only for usage issues encountered.
[X] I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
[X] I have searched for existing issues, including closed ones, and couldn't find a solution.
[X] I confirm that I am using English to submit this report in order to facilitate communication.

Environment Details

runpod 1 x H100 SXM 24 vCPU 251 GB RAM Python and cuda F5-TTS default setting

Steps to Reproduce

my method

prepare

my pretrained model : model_last.pt
add new dataset
finetune_gradio.py
1. new project
2. Transcribe Data
3. Vocab Check
4. Prepare Data
5. Train Data
Path to the Pretrained Checkpoint
- /workspace/F5-TTS/ckpts/my_new_dataset/model_last.pt
1. Start Training

✔️ Expected Behavior

Learning with an additional dataset to fine-tuned model_last.pt based on f5-ttsbase

❌ Actual Behavior

I made my own fine-tuning model with the F5-TTS base model

I want to learn additionally to my own fine-tuning model, but the following error occurs.

error

RuntimeError: Error(s) in loading state_dict for EMA: size mismatch for ema_model.transformer.text_embed.text_embed.weight: copying a param with shape torch.Size

Maybe it's a problem with using vocab.txt

What kind of vocab.txt should I use?

Or is it another matter?

I need a guide to further learn my own fine-tuned model, not f5-tts base.

Of course, my initial own model is one fine-tuned to the F5-tts base.

Thank you.

SWivid commented 2 weeks ago

my initial own model is one fine-tuned to the F5-tts base.

use same vocab.txt as the fined-tuned. or extend the embed weights leveraging finetune-gradio, see training readme

airar-dev commented 2 weeks ago

Thank you for your reply

Is the above image process correct?

I'm doing it the way above, but I still get the same error as below.

RuntimeError: Error(s) in loading state_dict for EMA: size mismatch for ema_model.transformer.text_embed.text_embed.weight: copying a param with shape torch.Size

What am I missing?

I'd really appreciate it if you could give me a guide

thank you again.!

add

I initially fine-tuned the F5-TTS base model with 50 hours of data in my language. I plan to add another 1000 hours of training data. Due to GPU limitations, I will train in increments of 50 hours. Ultimately, my goal is to create a fully fine-tuned model_last.pt after completing the entire 1000 hours of training.

SWivid commented 2 weeks ago

the pretrained model needs be extend along with vocab, see https://github.com/SWivid/F5-TTS/blob/3fcdbc70b4a9d4299e1ecd0b5a1c35209f23fd69/src/f5_tts/train/finetune_gradio.py#L1059-L1115 in which text embed weight is extended also https://github.com/SWivid/F5-TTS/blob/3fcdbc70b4a9d4299e1ecd0b5a1c35209f23fd69/src/f5_tts/train/finetune_gradio.py#L1112

airar-dev commented 2 weeks ago

Thank you!

I am in the process of testing again by modifying finetune_gradio.py.

Thank you so much. I'll share the test results.

Thank you!! again!

SWivid commented 4 days ago

will close this issue, feel free to open if further questions

SWivid / F5-TTS