Query regarding number of filters for pointwise convolutions.

piyush-das commented 3 years ago

Hi,

According to the paper, while calculating α(l) Ml is the number of filters in layer. However in the implementation for pointwise convolution it appears that we are using weight.shape[1] which is ideally the Cin and not the number of filters which should ideally have been weight.shape[0]. Is this by design ?

Thanks

EkdeepSLubana commented 3 years ago

IIRC, this was because of how dimensions are represented in pointwise filters. It may have been a design decision because it was difficult to make mobilenets work properly. Technically, benefits from orthogonality (as shown by work on mean field theory) arise when the network is overparameterized, a restriction mobilenets don't satisfy. You can find several details in the appendix of the paper.

piyush-das commented 3 years ago

The appendix states the following :

MobileNet-V1 has depth-separable filters that use M depthwise filters of dimensions 3×3×1 to process an input with M channels (see Figure 5). Each filter processes its corresponding channel,resulting in an output with M channels as well. This output is processed by N pointwise filters of dimensions1×1×N filters

However if previous depthwise convolutions had M output channels, the dimension of 1 pointwise convolution should have been 1x1xM and we would have had N such pointwise filters. I think the current implementation is assuming the kernel dimenstion as 1x1xN as has been mentioned in the appendix and hence weight.shape[1] == N, however ideally kernel dimension as per my understanding is 1x1xM and hence weight.shape[1] !=N [rather weight.shape[0]==N]. Is my understanding correct, or am I missing something ?

EkdeepSLubana commented 3 years ago

I see the point you are making and believe you are correct. In case you try it out and find it works better, please let me know and I will update the code.

EkdeepSLubana / OrthoReg

Query regarding number of filters for pointwise convolutions. #3