Hello, thanks for sharing your excellent work on detection distillation.
There are some codes that confuse me a little.
There is an unsqueeze operation at line 75 of distillation/losses/pgd_cls.py,
Mask_fg = Mask_fg.unsqueeze(dim=1)
Considering that the shape of mask_fg is [N, 1, H, W] according to line 61 of distillation/distillers/distill_pgd.py
cls_value_dict[(H, W)] = cv.reshape(N, 1, H, W).to(dtype=x_student[0].dtype)
the shape of Mask_fg is [N, 1, 1, H, W] after unsqueezing.
Then fg_fea_t = torch.mul(fea_t, torch.sqrt(Mask_fg)) is conducted between 2 tensors of different shapes, [N, C, H, W] and [N, 1, 1, H, W], which results in a fg_fea_t tensor of [N, N, C, H, W] shape.
I wonder if this is the expected result of the method, or I went wrong somewhere?
Hello, thanks for sharing your excellent work on detection distillation. There are some codes that confuse me a little.
There is an unsqueeze operation at line 75 of distillation/losses/pgd_cls.py,
Mask_fg = Mask_fg.unsqueeze(dim=1)
Considering that the shape of mask_fg is [N, 1, H, W] according to line 61 of distillation/distillers/distill_pgd.pycls_value_dict[(H, W)] = cv.reshape(N, 1, H, W).to(dtype=x_student[0].dtype)
the shape of Mask_fg is [N, 1, 1, H, W] after unsqueezing. Thenfg_fea_t = torch.mul(fea_t, torch.sqrt(Mask_fg))
is conducted between 2 tensors of different shapes, [N, C, H, W] and [N, 1, 1, H, W], which results in a fg_fea_t tensor of [N, N, C, H, W] shape.I wonder if this is the expected result of the method, or I went wrong somewhere?