SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
6.13k stars 681 forks source link

Finetuning goes NAN after 1/2 epochs #341

Open rikabi89 opened 2 hours ago

rikabi89 commented 2 hours ago

image

{20FB7495-12C8-4AAD-9CBF-ABDA78F3A334}

Any idea why this is happening? In this case it 1.5 hour dataset.

rikabi89 commented 2 hours ago

appears to be related to FP16? I switched it to "none" and everything was fine.

SWivid commented 13 minutes ago

@rikabi89 Sure, stuffs relative to gradient explosion use bf16 or none in this case