ankanbhunia / PIDM

Person Image Synthesis via Denoising Diffusion Model (CVPR 2023)
https://ankanbhunia.github.io/PIDM
MIT License
481 stars 62 forks source link

The possible contradiction for disentangled cfg between the paper and train code #44

Open takesukeDS opened 1 year ago

takesukeDS commented 1 year ago

Hello authors, your work is impressive. Thanks for sharing the code base.

I want to clarify about your disentangled cfg. The paper mentions that you omitted the pose condition and the style condition with 0.1 probability. However, this code(train.py) seems to omit only the style condition. Since, invocation of unet in GaussianDiffusion.training_losses()

model_output = model(x = torch.cat([x_t, target_pose],1), t = self._scale_timesteps(t), x_cond = img, prob = prob)

passes both target_pose as concatenated input and img(style) as condition along with prob. Although the x_cond is masked with the probability given in the forward function of the unet BeatGANsAutoencModel.forward(), the argumentx is used without any modification.

Could you clarify how you train your model for disentangled cfg?

Excuse me if I overlooked something. Best regards.

YanzuoLu commented 1 year ago

Same question about this.