azamatkhid / mnasnet-pytorch

Pytorch implementation of MnasNet-A1 & MnasNet-B1
3 stars 1 forks source link

Code Review #1

Closed flrngel closed 4 years ago

flrngel commented 4 years ago

1) Order with cnn, relu, batch norm

https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L30-L32

CNN -> Relu -> Batch Normalization

but reference also use relu after batch norm.

2) I think difference in No. of parameters is because of this

Your code

https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L118-L122

TorchVision code

https://github.com/pytorch/vision/blob/f6a3e0c3ca41125461ea2484936367d40d0ad34f/torchvision/models/mnasnet.py#L80

3) Expansion factor is also different

https://github.com/pytorch/vision/blob/f6a3e0c3ca41125461ea2484936367d40d0ad34f/torchvision/models/mnasnet.py#L116-L122

But you are right if we check from the paper.

4) Squeeze-and-Excitation Module of your code doesn't includes residual part image Your code https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L43-L48

MISC

Questions

Magauiya commented 4 years ago

Thank you for sharing your code! It would be nice to

1. Squeeze the following lines:

https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L103-L120

by adding a function: https://github.com/Magauiya/NTIRE20-denoising/blob/13b25fc7901c21545407e9b3065f6e44ab44213b/models/EBRN.py#L71-L76

2. remove the block if it is not used: https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L5-L22

3. write comments in code

4. Add a brief explanation on how to use your code and the network that you have implemented.

5. Add results on open datasets (e.g. CIFAR100, STL-10, etc.)

azamatkhid commented 4 years ago

@flrngel Thanks for comments. My replies are following:

1) As you already mentioned I followed the order given in the paper: convolution-bn-relu 2) Regarding the comparison with official implementation of the mnasnet given in torchvision, as you can see it is mnasnet-b1:

https://github.com/pytorch/vision/blob/f6a3e0c3ca41125461ea2484936367d40d0ad34f/torchvision/models/mnasnet.py#L84-L86

I already compared the architectures of mnasnet-b1 of my implementation and official one, there is no difference in terms of architecture structure and number of parameters.

The issues I mentioned is about mnasnet-a1, it is 3.9M parameters given in the paper but my implementation results in about 3.8M parameters.

3) For mnasnet-a1 (expansion coefficients are from the paper), and in case of mnasnet-b1 they are same with official implemented in torchvision

4) Squeeze-and-excitation:

image

As you can see from the figure above, there is no skip connection in the SE block Figure 7 (b). You referred to the original paper, introducing the SE block:

image

It is more about Inception module, rather than residual module in case of inverted mobilenet bottleneck, and actually there is a residual connection as shown in Figure 7 (b).

5) There is no clear reason for adaptive average pooling choice actually, will think about it.

azamatkhid commented 4 years ago

@Magauiya thanks for comments. Those are my replies:

  1. I will take into account your advice to make code more compact (to do)

  2. Done

  3. Good point, will add comments in the code to make it more readable (to do)

  4. I will add README file with all explanation with visualizations (to do)

  5. This is the next steps: train, validate and test on datasets, you mentioned in the comment.