facebookresearch / audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio
Other
2.66k stars 250 forks source link

About classifier-free guidance train policy #43

Closed Zessay closed 7 months ago

Zessay commented 7 months ago

Thanks for your excellent work!

I find it says you adopt classifier-free guidance policy to train the diffusion module in the paper, as it shows in the following picture.

image

However, in your codes, I find the cond_mode parameter is set when FinLMTransformer model is initialized, and won't change in the TrainLoop. Moreover, the forward function of the FiLMtranformer only uses the cond_mode of the model instance, doesn't use the condition signal in the y.

image image

So, I wonder whether the classifier-free guidance is used in the training process? Looking forward to your reply!

Zessay commented 7 months ago

I in-depth study the codes, and I think the null_cond_embed in the FiLMTransformer is used as classifier-free guidance and the cond_drop_prob is used as control signal, while the ClassifierFreeSampleModel doesn't work. Is it right?

image image
evonneng commented 7 months ago

Hi! This is absolutely correct that this version of the CFG model is broken. Thanks so much for pointing this out! I'll open a PR right now to fix this issue, will run a few tests to make sure this is the one matching what the paper states.

evonneng commented 7 months ago

Closing now, after patch fix. Thanks for pointing this out! Please feel to reopen if needed.