I notice that you set the guidance_scale=1.0 during training. That makes sense when you fine-tune a distilled model using FM loss.
However, the true cfg trick (true_gs=3.5) is introduced at inference stage and you choose a guidance_scale=4.0 for both positive and negative noise prediction. I wonder if the true cfg trick is necessary and why you choose 4.0 as the guidance scale?
I notice that you set the guidance_scale=1.0 during training. That makes sense when you fine-tune a distilled model using FM loss. However, the true cfg trick (true_gs=3.5) is introduced at inference stage and you choose a guidance_scale=4.0 for both positive and negative noise prediction. I wonder if the true cfg trick is necessary and why you choose 4.0 as the guidance scale?