Closed jason718 closed 4 years ago
Hi, thanks for pointing out! We adopted the architecture file from here and the author of the file confirmed that these changes do not lead to different results. I have updated the file so that it's the standard WRN-28-2.
I experimented with the original architecture and the updated, standard architecture on CIFAR-10 and SVHN and do not see a difference. Here is the detailed comparisons (average of 5 runs for the original file and average of 10 runs for the updated file):
CIFAR-10 (original / updated): 4000: 95.50 +- 0.15 / 95.68 +- 0.08 2000: 95.21 +- 0.11 / 95.27 +- 0.14 1000: 95.11 +- 0.16 / 95.25 +- 0.10 500: 94.93 +- 0.10 / 95.20 +- 0.09 250: 94.52 +- 0.22 / 94.57 +- 0.96
SVHN (original / updated): 4000: 97.76 +- 0.05 / 97.72 +- 0.10 2000: 97.92 +- 0.04 / 97.80 +- 0.06 1000: 97.71 +- 0.07 / 97.77 +- 0.07 500: 97.73 +- 0.08 / 97.73 +- 0.09 250: 97.36 +- 0.13 / 97.28 +- 0.40
Note that the results are obtained with EMA implemented. Please do a git pull to update your file.
I am closing this issue. Feel free to reopen if you have further questions.
The model for cifar10 and svhn is not the standard WideResNet? (1) When filter sizes don't match, this code use avg_pooling and zero_pad to deal with it, while WideResNet use 1x1 conv layer. Because of this, there is only 25 conv layers, not even 28. https://github.com/google-research/uda/blob/602483dbca113567b32e7395e5c0eadd3cf7e776/image/randaugment/wrn.py#L74 (2) There is one more skip connection across multiple blocks. https://github.com/google-research/uda/blob/602483dbca113567b32e7395e5c0eadd3cf7e776/image/randaugment/wrn.py#L155