Closed airar-dev closed 4 days ago
my initial own model is one fine-tuned to the F5-tts base.
use same vocab.txt as the fined-tuned. or extend the embed weights leveraging finetune-gradio, see training readme
Thank you for your reply
Is the above image process correct?
I'm doing it the way above, but I still get the same error as below.
RuntimeError: Error(s) in loading state_dict for EMA: size mismatch for ema_model.transformer.text_embed.text_embed.weight: copying a param with shape torch.Size
What am I missing?
I'd really appreciate it if you could give me a guide
thank you again.!
I initially fine-tuned the F5-TTS base model with 50 hours of data in my language. I plan to add another 1000 hours of training data. Due to GPU limitations, I will train in increments of 50 hours. Ultimately, my goal is to create a fully fine-tuned model_last.pt after completing the entire 1000 hours of training.
the pretrained model needs be extend along with vocab, see https://github.com/SWivid/F5-TTS/blob/3fcdbc70b4a9d4299e1ecd0b5a1c35209f23fd69/src/f5_tts/train/finetune_gradio.py#L1059-L1115 in which text embed weight is extended also https://github.com/SWivid/F5-TTS/blob/3fcdbc70b4a9d4299e1ecd0b5a1c35209f23fd69/src/f5_tts/train/finetune_gradio.py#L1112
Thank you!
I am in the process of testing again by modifying finetune_gradio.py.
Thank you so much. I'll share the test results.
Thank you!! again!
will close this issue, feel free to open if further questions
Checks
Environment Details
runpod 1 x H100 SXM 24 vCPU 251 GB RAM Python and cuda F5-TTS default setting
Steps to Reproduce
my method
my pretrained model : model_last.pt
add new dataset
finetune_gradio.py
new project
Transcribe Data
Vocab Check
Prepare Data
Train Data
Path to the Pretrained Checkpoint
✔️ Expected Behavior
Learning with an additional dataset to fine-tuned model_last.pt based on f5-ttsbase
❌ Actual Behavior
I made my own fine-tuning model with the F5-TTS base model
I want to learn additionally to my own fine-tuning model, but the following error occurs.
error
RuntimeError: Error(s) in loading state_dict for EMA: size mismatch for ema_model.transformer.text_embed.text_embed.weight: copying a param with shape torch.Size
Maybe it's a problem with using vocab.txt
What kind of vocab.txt should I use?
Or is it another matter?
I need a guide to further learn my own fine-tuned model, not f5-tts base.
Of course, my initial own model is one fine-tuned to the F5-tts base.
Thank you.