Closed ChiChivas closed 1 year ago
I know. Check "unet_output, all_intermediate_features = self.unet(z, t, encoder_hidden_states=encoder_hidden_states, return_intermediates=True)" in drag_pipeline.py, all_intermediate_features is a list of 5 tensors. Thanks.
In your paper section 2.3, says "The feature maps of the penultimate UNet block given z_t as input (denoted as F (z_t)) is used to conduct motion supervision." Your code actually is "args.unet_feature_idx = [3]" in ui_utils.py. It is well known that "layer_idx=[0]" means the first block. So which one is right?