Yujun-Shi / DragDiffusion

[CVPR2024, Highlight] Official code for DragDiffusion
https://yujun-shi.github.io/projects/dragdiffusion.html
Apache License 2.0
1.13k stars 82 forks source link

"args.unet_feature_idx = [3]" in ui_utils.py means that DIFT is extracted from the fourth upsampling block of UNet? But your paper says the penultimate one #32

Closed ChiChivas closed 1 year ago

ChiChivas commented 1 year ago

In your paper section 2.3, says "The feature maps of the penultimate UNet block given z_t as input (denoted as F (z_t)) is used to conduct motion supervision." Your code actually is "args.unet_feature_idx = [3]" in ui_utils.py. It is well known that "layer_idx=[0]" means the first block. So which one is right?

ChiChivas commented 1 year ago

I know. Check "unet_output, all_intermediate_features = self.unet(z, t, encoder_hidden_states=encoder_hidden_states, return_intermediates=True)" in drag_pipeline.py, all_intermediate_features is a list of 5 tensors. Thanks.