Closed flrngel closed 4 years ago
Thank you for sharing your code! It would be nice to
1. Squeeze the following lines:
by adding a function: https://github.com/Magauiya/NTIRE20-denoising/blob/13b25fc7901c21545407e9b3065f6e44ab44213b/models/EBRN.py#L71-L76
2. remove the block if it is not used: https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L5-L22
3. write comments in code
4. Add a brief explanation on how to use your code and the network that you have implemented.
5. Add results on open datasets (e.g. CIFAR100, STL-10, etc.)
@flrngel Thanks for comments. My replies are following:
1) As you already mentioned I followed the order given in the paper: convolution-bn-relu 2) Regarding the comparison with official implementation of the mnasnet given in torchvision, as you can see it is mnasnet-b1:
I already compared the architectures of mnasnet-b1 of my implementation and official one, there is no difference in terms of architecture structure and number of parameters.
The issues I mentioned is about mnasnet-a1, it is 3.9M parameters given in the paper but my implementation results in about 3.8M parameters.
3) For mnasnet-a1 (expansion coefficients are from the paper), and in case of mnasnet-b1 they are same with official implemented in torchvision
4) Squeeze-and-excitation:
As you can see from the figure above, there is no skip connection in the SE block Figure 7 (b). You referred to the original paper, introducing the SE block:
It is more about Inception module, rather than residual module in case of inverted mobilenet bottleneck, and actually there is a residual connection as shown in Figure 7 (b).
5) There is no clear reason for adaptive average pooling choice actually, will think about it.
@Magauiya thanks for comments. Those are my replies:
I will take into account your advice to make code more compact (to do)
Done
Good point, will add comments in the code to make it more readable (to do)
I will add README file with all explanation with visualizations (to do)
This is the next steps: train, validate and test on datasets, you mentioned in the comment.
1) Order with cnn, relu, batch norm
https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L30-L32
CNN -> Relu -> Batch Normalization
but reference also use relu after batch norm.
2) I think difference in No. of parameters is because of this
Your code
https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L118-L122
TorchVision code
https://github.com/pytorch/vision/blob/f6a3e0c3ca41125461ea2484936367d40d0ad34f/torchvision/models/mnasnet.py#L80
3) Expansion factor is also different
https://github.com/pytorch/vision/blob/f6a3e0c3ca41125461ea2484936367d40d0ad34f/torchvision/models/mnasnet.py#L116-L122
But you are right if we check from the paper.
4) Squeeze-and-Excitation Module of your code doesn't includes residual part Your code https://github.com/azamatkhid/mnasnet-pytorch/blob/f855edc8d114636f88ef586fa257a7f5d1f52b9b/model.py#L43-L48
MISC
F(a=True)
notF(a = True)
Questions