Closed Bobholamovic closed 5 years ago
We normalize the association matrix using softmax here: https://github.com/NVlabs/ssn_superpixels/blob/082918950d768f29466148666b7f434c78663dc6/create_net.py#L95
We initialize the group convolution filter weights in this function: https://github.com/NVlabs/ssn_superpixels/blob/082918950d768f29466148666b7f434c78663dc6/utils.py#L31
It makes sense. Thanks for your reply
Sorry for reopening this issue. I've been recently working on a Pytorch implementation of your paper and the work is roughly done. However, I found a few details that I'm not so sure about. Among them, the most confusing one would be the L.Softmax
in create_net.py
as we've discussed before. This was explained by normalizing the association matrix
in your afore reply, but after a read-through, I believe the normalize
function in Line 17 and the cuda lib of L.SpixelFeature2
also perform the normalization task. As far as I can understand from the paper, we should first get the original Q without any normalization for further tasks (e.g., mapping between pixels and spixels) and perhaps we need a exponential rather than a softmax.
Sorry for the lengthy description. Now my question is why a softmax is used instead of a exponential transformation, i.e., why an additional normalization is done here? Is it due to some numeric stability issues? Please correct me if I am wrong.
I am not sure if I understood your question completely. 'normalize' function is only used once at the end, where as 'softmax' is used at each iteration. There are actually two normalizations in each iteration. One is normalization across superpixels (using Softmax) and another one is normalization across pixels (in SpixelFeature2). Sorry that this is not very clear from the paper, where we did not explicitly mention the normalization over superpixels in some places. We talk about using 'row-normalized' and 'column-normalized' Q in sub-section "Mapping between pixel and superpixel representations".
It would be great if you can share your pytorch implementation to the community once it is ready. Thanks.
Many thanks. That is exatly what I was asking. Then I have one last question: why is there a normalization across superpixel in prior to the one across pixels at every iteration, since without the former one, I think, the mathematical meaning of Q would be probably clearer (in this case, Q would be an 'absolute' one rather than a 'relative' one, giving convenience to the next normalization across pixels).
I've just noticed that @CYang0515 has also implemented this work in Pytorch, what a coincidence. Anyway, I'll make my implementation public as soon as I get permission from my supervisor. Thanks again for the patient answer and thank you for your attention.
That makes sense. The network may also work without normalization across superpixels. Since there is an exponentiation involved, things might be more stable with softmax (normalization). I haven't tried without normalization. Let me know if you happen to try without normalization as well.
You are right. I've tried training the network both with and without softmax for a few epochs, and the one with softmax observed better stability when the input range varied. Yet I still need some further experiments to benchmark these two. My question is settled. Thank you!
Thanks for your code. Unfamiliar with Caffe and CUDA C/Cpp, I have two questions perplexing me when reading the code:
Yet I've not found any row-normalization operation adopted where it ought to be (I suppose it t o be inside
decode_features
increate_net.py
). Does this implementation make a simplification, or is the normalization done elsewhere?decode_features
increate_net.py
, I notice that the neighboring superpixel featuresconcat_spixel_feat
are concatenated in a rather neat and tricky way, namely by passing through a group convolution. My doubt is that in my understanding, the convolution kernels should be fixed to specific values in order to achieve this. The kernels of each group should look like this if my guess were right:But nowhere in the repo can I find the initial value setting part of this convolution layer. I wonder where you put it or if it is that my guess is just incorrect?