VITA-Group / Adversarial-Contrastive-Learning

[NeurIPS 2020] “ Robust Pre-Training by Adversarial Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang
112 stars 16 forks source link

Inconsistent definition of proj_head module? #1

Closed zjysteven closed 3 years ago

zjysteven commented 3 years ago

Hi,

Thanks for releasing the code! I have a minor question on which some clarification would be appreciated. The code for proj_head seems to have an inconsistency. If twoLayerProj is True, then self.fc3 and self.bn3 would not be initialized, while in the forward pass they would be called. Is it that I miss something? Thanks!

https://github.com/VITA-Group/Adversarial-Contrastive-Learning/blob/a7e6d48686826b856a9f16dcc5cb8b2c70bd084a/models/resnet_multi_bn.py#L192-L194

https://github.com/VITA-Group/Adversarial-Contrastive-Learning/blob/a7e6d48686826b856a9f16dcc5cb8b2c70bd084a/models/resnet_multi_bn.py#L210-L214

geekJZY commented 3 years ago

Thank you very much for pointing out this. This is a bug I made during code cleaning.

zjysteven commented 3 years ago

OK, thanks for the clarification!

zjysteven commented 3 years ago

@geekJZY Sorry to bother you again, but would you mind revealing a little bit about how the learning rate is chosen (the default value for LARS optimizer is 5.0 in the code)? Are you following something like linear scaling (e.g. a base learning rate * Batch Size / 256)? I'm having this question because most contrastive learning papers are taking ImageNet as the benchmark and don't tell much about the learning rate setup for CIFAR10. Any suggestions would be appreciated!

geekJZY commented 3 years ago

For CIFAR-10 learning rate, we grid search it under the standard training setting and choose the one yield the best linear eval accuracy.