Open Shuweis opened 3 weeks ago
Within the unet
model, conv_in
and conv_in2
are defined, but one of them is not involved in the computation during training, resulting in its gradient being None
. This is illustrated in the following code from models.unet_3d_blocks_mask
:
# 2. pre-process
if self.motion_mask and mask is not None:
mask = repeat(mask, 'b 1 1 h w -> (t b) 1 f h w', t=sample.shape[0] // mask.shape[0], f=sample.shape[2])
sample = torch.cat([mask, sample], dim=1)
sample = sample.permute(0, 2, 1, 3, 4).reshape((sample.shape[0] * num_frames, -1) + sample.shape[3:])
sample = self.conv_in2(sample)
else:
sample = sample.permute(0, 2, 1, 3, 4).reshape((sample.shape[0] * num_frames, -1) + sample.shape[3:])
sample = self.conv_in(sample)
Thanks for your reply! And I found another question about it. If I directly print the "unet.conv_in2.weight.grad" after "accelerator.backward(loss)" It will return None when I use accelerate launch for distributed training. And If I do not use it, just use "python train.py -- ..." It will not return None. Does the accelerate launch affect the gradient of this layer?
When I finetune the animate-anything, I found that the gradient of unfrozen layer of Unet (eg. conv_in) is None. And I print the requires_grad of conv_in, the result is True. It means that the fine-tuning on the animate-anything is not work. What makes this phenomenon? Maybe It has some problems in the training process.
Hope for your reply!