NVlabs / ssn_superpixels

Superpixel Sampling Networks (ECCV2018)
https://varunjampani.github.io/ssn/
Other
350 stars 56 forks source link

About the weights of the loss function #9

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi,

Thanks for the great work and sharing the code. I am trying to run your code, but I am not familiar with caffe, and get confused about the weight.

In the code, there are three losses -- pos_loss (loss 1),col_loss(loss2) and losswithoutSoftmax (loss3), where I believe only pos_loss and lossWithoutSoftmax are using. However, when I ran it, I usually get something as what I posted at the end.

  1. What is the loss value next to the iteration number (e.g. Iteration 6, loss = 1.16342)? I thought it was the total_value, but I found sometimes it could be less than loss1, as the Iteration 6 shows.

  2. I am trying to transfer your code to pytorch, but I am not clear how the weight 1e-5 is applied. Based on the printing result of the code, to me, it seems something like

loss = torch.norm(pos_pix_feat - pos_recon_feat).sum() * 1e-5 + elem_wise_cross_tropy(rec_label, ori_label).mean()

where the pos_loss is using the sum of the element-wise l2 norm, while the label_loss is using the mean of the element-wise cross entropy. Is this true? I tried to use sum() or mean() for both sub-losses, but none of them are even close to the loss value I got from your code.

Thanks in advance.

======================= I0301 11:46:37.911602 9853 solver.cpp:228] Iteration 6, loss = 1.16342 I0301 11:46:37.911633 9853 solver.cpp:244] Train net output #0: loss1 = 119468 ( 1e-05 = 1.19468 loss) I0301 11:46:37.911638 9853 solver.cpp:244] Train net output #1: loss2 = 112387 I0301 11:46:37.911644 9853 solver.cpp:244] Train net output #2: loss3 = 0.15799 ( 1 = 0.15799 loss)

I0301 11:46:38.340327 9853 solver.cpp:228] Iteration 7, loss = 1.14899 I0301 11:46:38.340358 9853 solver.cpp:244] Train net output #0: loss1 = 90096.4 ( 1e-05 = 0.900964 loss) I0301 11:46:38.340363 9853 solver.cpp:244] Train net output #1: loss2 = 24685.4 I0301 11:46:38.340366 9853 solver.cpp:244] Train net output #2: loss3 = 0.147002 ( 1 = 0.147002 loss)

I0301 11:46:38.935927 9853 solver.cpp:228] Iteration 8, loss = 1.15263 I0301 11:46:38.935950 9853 solver.cpp:244] Train net output #0: loss1 = 109387 ( 1e-05 = 1.09387 loss) I0301 11:46:38.935955 9853 solver.cpp:244] Train net output #1: loss2 = 112974 I0301 11:46:38.935958 9853 solver.cpp:244] Train net output #2: loss3 = 0.0878664 ( 1 = 0.0878664 loss) I0301 11:46:38.935961 9853 sgd_solver.cpp:106] Iteration 8, lr = 0.0001

varunjampani commented 5 years ago

Hi,

Your interpretation of loss is correct. We use Euclidean loss on position features and 'Loss without Softmax' on segment labels. Lines 293 and 312 in https://github.com/NVlabs/ssn_superpixels/blob/master/create_net.py.

'LossWithoutSoftmax' layer is a custom layer I implemented which is a modified version of 'SoftmaxWithLoss' layer. SoftmaxWithLoss layer (http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1SoftmaxWithLossLayer.html) does 'Softmax' followed by 'Multinomial logistic loss'. In 'LossWithoutSoftmax', we directly do multinomial logistic loss without Softmax. Please check whether there is softmax or not in the cross entropy loss you are using.

A student I am advising is also planning to do a pytorch implementation. I think, it would be good to co-ordinate these efforts in porting to pytorch. Would you be interested in co-ordinating? Can you email me if you are interested?

Thanks, Varun

ghost commented 5 years ago

Thanks Varun,

I realized the implementation of l2 norm in caffe is actually different from pytorch, so just to confirm the loss function. Given a input size asb * c * h * w, the loss will be:

L = 1e-5 (2N) i N ||x1i - x2i|| 2 + ∑i N qi log (pi)

where N = b * h * w, x1 is pos_pix_feat , x2 is pos_recon_feat, q is ori_label and p is rec_label. Is it correct?

It would be very nice to have a further decision. A email has been sent to varunjampani@gmail.com.

varunjampani commented 5 years ago

The equations seem correct. I do not clearly remember if caffe uses 1/2N or 1/N for Eucledian loss.

ghost commented 5 years ago

I checked the caffe document, and I believe they use 1/2N for that. http://caffe.berkeleyvision.org/tutorial/layers/euclideanloss.html

Thanks a lot for your kind reply!