ShichenLiu / CondenseNet

CondenseNet: Light weighted CNN for mobile devices
MIT License
694 stars 131 forks source link

dropout before convolution layer #30

Closed lizhenstat closed 4 years ago

lizhenstat commented 4 years ago

Hi, I noticed that the dropout is placed before convolution layer, In the original densenet-torch implementation, the order in each block is BN-->relu-->conv-->dropout Is there a particular reason for doing so?

    def forward(self, x):
        self._check_drop()
        x = self.norm(x)
        x = self.relu(x)
        if self.dropout_rate > 0:
            x = self.drop(x)
        ### Masked output
        weight = self.conv.weight * self.mask
        return F.conv2d(x, weight, None, self.conv.stride,
                        self.conv.padding, self.conv.dilation, 1)
ShichenLiu commented 4 years ago

Hi,

There is no particular reason. It is just the difference of implementation. Thanks

lizhenstat commented 4 years ago

I find one related question on stackoverflow: https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout which suggests placing dropout layer right after ReLu.

Since I test condensenet-182 on cifar100 with different configurations:

  1. condensenet-182-dropout_after_conv 19.2% (dropout rate=0.1)
  2. condensenet-182-dropout_before_conv 18.79%
  3. condensenet-182-dropout_before_conv 18.7% (dropout rate=0.2)
  4. condensenet-182-dropout_before_conv 22.49% (dropout rate=0.3) while the dropout before convolution layer is 18.47% reported in the paper https://github.com/ShichenLiu/CondenseNet/issues/28#issuecomment-531881562 author mentioned we can try different dropout rate , I will conduct further experiments