Open 123456789asdfjkl opened 1 year ago
Same question. The forward function in LayerScaleBlockClassAttn realized in timm is: x = x + self.drop_path(self.gamma1 self.attn(self.norm1(x))) x = x + self.drop_path(self.gamma2 self.mlp(self.norm2(x)))
https://github.com/NVlabs/FAN/blob/ee1b7df1016205ed26e28c0683b14238f3530a84/models/fan.py#L311-L315
Hi! Thank you for your great work! According to CaiT, I think the code should be in the following form:
cls_token = x[:, 0:1] + self.drop_path(self.gamma2 * self.mlp(x[:, 0:1] ))
x = torch.cat([cls_token, x[:, 1:], dim=1)
Hi, thanks for pointing this out. There are indeed some difference between our implementation and CaiT. But the experiments in the paper are all using the method in the released code…
Same question. The forward function in LayerScaleBlockClassAttn realized in timm is: x = x + self.drop_path(self.gamma1 self.attn(self.norm1(x))) x = x + self.drop_path(self.gamma2 self.mlp(self.norm2(x)))
Hi, thanks for pointing this out. There are indeed some difference between our implementation and CaiT. But the experiments in the paper are all using the method in the released code…
https://github.com/NVlabs/FAN/blob/ee1b7df1016205ed26e28c0683b14238f3530a84/models/fan.py#L311-L315 Hi! Thank you for your great work! According to CaiT, I think the code should be in the following form:
cls_token = x[:, 0:1] + self.drop_path(self.gamma2 * self.mlp(x[:, 0:1] ))
x = torch.cat([cls_token, x[:, 1:], dim=1)