NVlabs / ssn_superpixels

Superpixel Sampling Networks (ECCV2018)
https://varunjampani.github.io/ssn/
Other
350 stars 56 forks source link

Some questions on source code #26

Closed Bobholamovic closed 5 years ago

Bobholamovic commented 5 years ago

Thanks for your code. Unfamiliar with Caffe and CUDA C/Cpp, I have two questions perplexing me when reading the code:

  1. In page 8 of the paper it is said that:

using row-normalized association matrix

Yet I've not found any row-normalization operation adopted where it ought to be (I suppose it t o be inside decode_features in create_net.py). Does this implementation make a simplification, or is the normalization done elsewhere?

  1. Again in decode_features in create_net.py, I notice that the neighboring superpixel features concat_spixel_feat are concatenated in a rather neat and tricky way, namely by passing through a group convolution. My doubt is that in my understanding, the convolution kernels should be fixed to specific values in order to achieve this. The kernels of each group should look like this if my guess were right:
[1, 0, 0                              [0, 1, 0                                 [0, 0, 0
 0, 0, 0                               0, 0, 0                                  0, 0, 0
 0, 0, 0]    for channel 1,            0, 0, 0]    for channel 2, ...,          0, 0, 1]    for channel 9

But nowhere in the repo can I find the initial value setting part of this convolution layer. I wonder where you put it or if it is that my guess is just incorrect?

varunjampani commented 5 years ago

We normalize the association matrix using softmax here: https://github.com/NVlabs/ssn_superpixels/blob/082918950d768f29466148666b7f434c78663dc6/create_net.py#L95

We initialize the group convolution filter weights in this function: https://github.com/NVlabs/ssn_superpixels/blob/082918950d768f29466148666b7f434c78663dc6/utils.py#L31

Bobholamovic commented 5 years ago

It makes sense. Thanks for your reply

Bobholamovic commented 5 years ago

Sorry for reopening this issue. I've been recently working on a Pytorch implementation of your paper and the work is roughly done. However, I found a few details that I'm not so sure about. Among them, the most confusing one would be the L.Softmax in create_net.py as we've discussed before. This was explained by normalizing the association matrix in your afore reply, but after a read-through, I believe the normalize function in Line 17 and the cuda lib of L.SpixelFeature2 also perform the normalization task. As far as I can understand from the paper, we should first get the original Q without any normalization for further tasks (e.g., mapping between pixels and spixels) and perhaps we need a exponential rather than a softmax.

Sorry for the lengthy description. Now my question is why a softmax is used instead of a exponential transformation, i.e., why an additional normalization is done here? Is it due to some numeric stability issues? Please correct me if I am wrong.

varunjampani commented 5 years ago

I am not sure if I understood your question completely. 'normalize' function is only used once at the end, where as 'softmax' is used at each iteration. There are actually two normalizations in each iteration. One is normalization across superpixels (using Softmax) and another one is normalization across pixels (in SpixelFeature2). Sorry that this is not very clear from the paper, where we did not explicitly mention the normalization over superpixels in some places. We talk about using 'row-normalized' and 'column-normalized' Q in sub-section "Mapping between pixel and superpixel representations".

It would be great if you can share your pytorch implementation to the community once it is ready. Thanks.

Bobholamovic commented 5 years ago

Many thanks. That is exatly what I was asking. Then I have one last question: why is there a normalization across superpixel in prior to the one across pixels at every iteration, since without the former one, I think, the mathematical meaning of Q would be probably clearer (in this case, Q would be an 'absolute' one rather than a 'relative' one, giving convenience to the next normalization across pixels).

I've just noticed that @CYang0515 has also implemented this work in Pytorch, what a coincidence. Anyway, I'll make my implementation public as soon as I get permission from my supervisor. Thanks again for the patient answer and thank you for your attention.

varunjampani commented 5 years ago

That makes sense. The network may also work without normalization across superpixels. Since there is an exponentiation involved, things might be more stable with softmax (normalization). I haven't tried without normalization. Let me know if you happen to try without normalization as well.

Bobholamovic commented 5 years ago

You are right. I've tried training the network both with and without softmax for a few epochs, and the one with softmax observed better stability when the input range varied. Yet I still need some further experiments to benchmark these two. My question is settled. Thank you!