Transposed dimisions for vgg when loading weights of conv layers

ysj1173886760 commented 2 years ago

For loading pretrained vgg model. i think the weight matrix for conv layers is [H, W, Cin, Cout]. The codes says the pretrained model has the matrix of [W, H, Cin, Cout], so it transposed it. Because i've used the same pretrained model to build the original vgg net and evaluated it. It turns out that if you use [H, W, Cin, Cout] to load the model, the effect is better than the [W, H, Cin, Cout] version. So i want to ask why we are transposing the [H, W]. Or any motivation to do this.

anishathalye commented 2 years ago

We're trying to match the original VGG network, not transposing filters on purpose, so if that's indeed the case, it's a bug and should be fixed.

Though I think the code in vgg.py is correct as-is? We load from a pre-trained MatConvNet model, which stores weights in [W, H, Cin, Cout] order, according to vl_nnconv docs:

F is an array of dimension FW x FH x FC x K where (FH,FW) are the filter height and width and K the number o filters in the bank. FC is the number of feature channels in each filter

On the other hand, TensorFlow expects weights in [H, W, Cin, Cout] order, according to tf.nn.conv2d docs:

filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]

So to convert from MatConvNet format to TensorFlow format, we transpose the 0th and 1st indices with kernels = np.transpose(kernels, (1, 0, 2, 3)).

ysj1173886760 commented 2 years ago

Thanks for your clarification. You really made my day. I've use the different format to read the parameters and tested on some pictrues. Here is the result You're about to convince me now. I will do more investigations and will have feedback here.

anishathalye commented 2 years ago

Closing due to inactivity, feel free to open if there's anything new.

anishathalye / neural-style

Transposed dimisions for vgg when loading weights of conv layers #180