IDKiro / DehazeFormer

[IEEE TIP] Vision Transformers for Single Image Dehazing
MIT License
348 stars 33 forks source link

about the _init_weights_ #5

Closed hezw2016 closed 2 years ago

hezw2016 commented 2 years ago

Hi, Thank you for sharing this amazing work and code with us. However, I was confused with the _initweights function in dehazeformer.py file. " gain = (8 * self.network_depth) ** (-1/4) " Why the initial weights will relate to network_depth and digit 8?

Thank you again.

Best, Zewei

IDKiro commented 2 years ago

Refer to a new paper: DeepNet: Scaling Transformers to 1,000 Layers

In fact, most of the models were initialized using a standard deviation of 0.02, since they were done in January. From the experimental results, there is not much difference between the two, but I think it makes more sense to initialize in this way.

hezw2016 commented 2 years ago

Refer to a new paper: DeepNet: Scaling Transformers to 1,000 Layers

In fact, most of the models were initialized using a standard deviation of 0.02, since they were done in January. From the experimental results, there is not much difference between the two, but I think it makes more sense to initialize in this way.

Thank you very much for providing this interesting paper

IDKiro commented 2 years ago

Happy to help you