Closed xjtuwh closed 2 years ago
The code fan.py line 536: x = x + self.drop_path(self.gamma1 x_new) line 539: x = x + self.drop_path(self.gamma2 x_new) May you explain the self.gamma1 and self.gamma2 which are not introduced in the paper on arXiv: 2204.12451v2. In addition, can you give me the published paper on ICML?
Hi,
The gamma is used to stablize the training for large models, following the same practice in CaiT (https://arxiv.org/abs/2103.17239).
We are preparing the ICML camera version and should be ready in 1-2 days and we will update those details in the next version in arxiv also.
Thank you very much.
The code fan.py line 536: x = x + self.drop_path(self.gamma1 x_new) line 539: x = x + self.drop_path(self.gamma2 x_new) May you explain the self.gamma1 and self.gamma2 which are not introduced in the paper on arXiv: 2204.12451v2. In addition, can you give me the published paper on ICML?