Open SQY-qh opened 1 month ago
How is the quality of your model's generation (FAD, etc)? IIRC loss_wrt_gt
is not expected to decrease (that's not the goal of consistency distillation), and it stays at ~2 in my run. As long as loss_wrt_teacher
decreases you're good.
I use the same config as the author, but the "loss_wrt_gt" keep rising while training and the generate result just like random noise at about epoch10, I use tango pretrained model as the stage1 model as I don't train stage1 and only train stage2, is that the cause resulting in my situation?