NVlabs / FAN

Official PyTorch implementation of Fully Attentional Networks
https://arxiv.org/abs/2204.12451
Other
469 stars 28 forks source link

gamma1 in fan.py #9

Closed xjtuwh closed 2 years ago

xjtuwh commented 2 years ago

The code fan.py line 536: x = x + self.drop_path(self.gamma1 x_new) line 539: x = x + self.drop_path(self.gamma2 x_new) May you explain the self.gamma1 and self.gamma2 which are not introduced in the paper on arXiv: 2204.12451v2. In addition, can you give me the published paper on ICML?

zhoudaquan commented 2 years ago

The code fan.py line 536: x = x + self.drop_path(self.gamma1 x_new) line 539: x = x + self.drop_path(self.gamma2 x_new) May you explain the self.gamma1 and self.gamma2 which are not introduced in the paper on arXiv: 2204.12451v2. In addition, can you give me the published paper on ICML?

Hi,

The gamma is used to stablize the training for large models, following the same practice in CaiT (https://arxiv.org/abs/2103.17239).

We are preparing the ICML camera version and should be ready in 1-2 days and we will update those details in the next version in arxiv also.

xjtuwh commented 2 years ago

Thank you very much.