NVlabs / FAN

Official PyTorch implementation of Fully Attentional Networks
https://arxiv.org/abs/2204.12451
Other
464 stars 28 forks source link

Some problems about Code #19

Open 123456789asdfjkl opened 1 year ago

123456789asdfjkl commented 1 year ago

https://github.com/NVlabs/FAN/blob/ee1b7df1016205ed26e28c0683b14238f3530a84/models/fan.py#L311-L315 Hi! Thank you for your great work! According to CaiT, I think the code should be in the following form: cls_token = x[:, 0:1] + self.drop_path(self.gamma2 * self.mlp(x[:, 0:1] )) x = torch.cat([cls_token, x[:, 1:], dim=1)

Youskrpig commented 1 year ago

Same question. The forward function in LayerScaleBlockClassAttn realized in timm is: x = x + self.drop_path(self.gamma1 self.attn(self.norm1(x))) x = x + self.drop_path(self.gamma2 self.mlp(self.norm2(x)))

zhoudaquan commented 1 year ago

https://github.com/NVlabs/FAN/blob/ee1b7df1016205ed26e28c0683b14238f3530a84/models/fan.py#L311-L315

Hi! Thank you for your great work! According to CaiT, I think the code should be in the following form: cls_token = x[:, 0:1] + self.drop_path(self.gamma2 * self.mlp(x[:, 0:1] )) x = torch.cat([cls_token, x[:, 1:], dim=1)

Hi, thanks for pointing this out. There are indeed some difference between our implementation and CaiT. But the experiments in the paper are all using the method in the released code…

zhoudaquan commented 1 year ago

Same question. The forward function in LayerScaleBlockClassAttn realized in timm is: x = x + self.drop_path(self.gamma1 self.attn(self.norm1(x))) x = x + self.drop_path(self.gamma2 self.mlp(self.norm2(x)))

Hi, thanks for pointing this out. There are indeed some difference between our implementation and CaiT. But the experiments in the paper are all using the method in the released code…