Closed ysj1173886760 closed 2 years ago
We're trying to match the original VGG network, not transposing filters on purpose, so if that's indeed the case, it's a bug and should be fixed.
Though I think the code in vgg.py
is correct as-is? We load from a pre-trained MatConvNet model, which stores weights in [W, H, Cin, Cout] order, according to vl_nnconv docs:
F is an array of dimension FW x FH x FC x K where (FH,FW) are the filter height and width and K the number o filters in the bank. FC is the number of feature channels in each filter
On the other hand, TensorFlow expects weights in [H, W, Cin, Cout] order, according to tf.nn.conv2d docs:
filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]
So to convert from MatConvNet format to TensorFlow format, we transpose the 0th and 1st indices with kernels = np.transpose(kernels, (1, 0, 2, 3))
.
Thanks for your clarification. You really made my day. I've use the different format to read the parameters and tested on some pictrues. Here is the result You're about to convince me now. I will do more investigations and will have feedback here.
Closing due to inactivity, feel free to open if there's anything new.
For loading pretrained vgg model. i think the weight matrix for conv layers is [H, W, Cin, Cout]. The codes says the pretrained model has the matrix of [W, H, Cin, Cout], so it transposed it. Because i've used the same pretrained model to build the original vgg net and evaluated it. It turns out that if you use [H, W, Cin, Cout] to load the model, the effect is better than the [W, H, Cin, Cout] version. So i want to ask why we are transposing the [H, W]. Or any motivation to do this.