google-research / mixmatch

Apache License 2.0
1.13k stars 163 forks source link

why not using dropout in the wide resnet as done in the wide resnet paper? #43

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi

I'm just wondering why in your wide resnet backbone, you don't use dropout. Isn't that the original wide resnet paper showed that dropout could be useful?

Thanks

carlini commented 2 years ago

Two reasons. First, We wanted mixmatch to be as simple as possible and dropout is extra complexity. And second, we tried briefly, and it didn't seem to do much very much (the original paper found something like +0.1% gains, and we saw something around that).

ghost commented 2 years ago

thank you, that make sense!

i have one more question, what does the function "tune" in your code do? https://github.com/google-research/mixmatch/blob/1011a1d51eaa9ca6f5dba02096a848d1fe3fc38e/libml/train.py#L175

I have a hard time relating it to the other part of the code

carlini commented 2 years ago

You can basically ignore it. It's used to adjust batchnorm statistics at eval time but in practice is usually a no-op.

ghost commented 2 years ago

thanks