Open LogisticFreedom opened 7 years ago
Nice question @LogisticFreedom. I was also confused with this idea initially. There are two types on Bilinear CNNs.
This particular implementation is of type Symmetric with VGG16-VGG16. Since weight initializations for both the VGG16 is same, weight updates will be same for both the networks and both the networks will have same weights after every iteration. So instead of declaring two networks we can just have a single network and we can save the memory by half.
self.phi_I = tf.einsum('ijkm,ijkn->imn',self.conv5_3,self.conv5_3) This line is doing the outer product on the output of the conv5_3 layer and is similar to having two identical networks.
When you are implementing Asymmetric Bilinear CNN, for example VGG16-Mnet, you will need to define two separate network definitions because the weight initializations will be different for both the networks. You will have one network definition for VGG16 and one for the Mnet. Then just have the outer product using "tf.einsum" on the outputs of their final convolutional layers.
Hope that answers your question. Please let me know if you need more information.
@abhaydoke09 Thank you very much for your answer! And I have another problem, how can I use ResNet to build a bilinear CNN model, I use ResNet101,its output is 204877.I have try it in Keras, but it doesn't work.
what's the output size of the last convolutional layer??
If using two identical CNNs have the same weight initialization, same weight update and both networks will have the same weights after every iteration. Then, what is the benefit of using BCNN instead of a normal CNN architecture? What am I missing?
When we are taking an outer product of the last layers of these identical networks, we are getting the confusion matrix of features at every location. The combined form now looks at the pairwise features at different locations. Take a look at slide 6 in http://people.cs.umass.edu/~smaji/presentations/BilinearModelsICCV2015oral.pdf
According to slide 6, it makes sense that one feature extractor (in this case a cnn) will get different features (e.g. part) and the second feature extractor (another cnn) will get different features (e.g. color). But I think this can only be the case with assymetrical CNNs. In the case of symmetrical CNNs, the same features are being extracted at each location. What I understand is during outer product one matrix is transposed due to which we get confused features from different locations. Is that right ?
Interesting discussion. In the paper Improved Bilinear Pooling with CNNs, the author says symmetric B-CNNs are identical to the Second-Order Pooling (O2P).
Hello @abhaydoke09 , wonderful work. I have same question as @ahmadmobeen. Can you please tell how same network e.g., Vgg16 is extracting different features at same location? Thanks :-)
hello, I still have a problem. After running the second part of the whole model, I will finish training. It seems that the final model is not saved in the code. Why is this done in the absence of the training model? Can you give me some details?
Thanks for your code about binlinaer CNN,but I have a question, I have read this paper, and I think there are two CNN in this model, but I can't find them in your code, I only find one VGG-16 net, did I miss something? Could you explain it in your code ?Thank you very much!