PiotrDabkowski / pytorch-saliency

Real-time image saliency 🌠 (NIPS 2017)
125 stars 16 forks source link

Inconsistency of New Saliency Metric Calculation #4

Open yulongwang12 opened 6 years ago

yulongwang12 commented 6 years ago

Hi Piotr,

Thanks for your great work and code release. Currently I am working on saliency and notice that you propose a new saliency metric (Sec. 3.2) that s(a, p) = log(a) - log(p). However, when I evaluated on ImageNet with caffe pretrained GoogleNet, I got the following results

\ Saliency Metric (paper reported) Saliency Metric (mine)
ground truth 0.284 0.3044
max box 1.366 1.3443
central box 0.645 0.7238

From max box result, I think my result conforms to your calculation. And ground truth result is close (the difference is because that I didn't calculate all the ground truth boxes). But the difference of central box result is quite large, and I cannot figure out what's going wrong. So could you please release saliency metric evaluation code? Thank you very much, and looking forward to your replay.

Best regards

Yulong

PiotrDabkowski commented 6 years ago

Hmm, that's strange, but this could be explained by the differences in classifiers (are you sure you are using googlenet?) and the resizing strategy. The classifier has a very similar loss on both max box and central box (the -log(p) term). Obviously, the area is halved in case of the central box, so the area term will be log(1)=0 in case of the central box and log(0.5)=-0.7 in case of the central box. We would therefore expect the saliency metric for the central box to be about 0.7 smaller. In my case the classifier actually performed slightly better on central box than on max box, in your case it appears that the opposite was true. Anyway, the results look reasonable, make sure you are using the same net though.

The main implementation used for the paper was originally written in tensorflow and was a mess, so I reimplemented the main part in Pytorch. I could release the evaluation code as well if that helps.

yulongwang12 commented 6 years ago

I'm currently using Caffe pretrained googlenet, which is also called inception-v1. So do you also use inception-v1 model? I find tensorflow official implementation here, is that right?

I also want to confirm the preprocessing procedure for central box. First get the cropped area at the center with the size of (1/sqrt(2) H, 1/sqrt(2) W). Then resize the cropped image to 224 x 224, and do the normalization as before. Did I miss something? Thanks for your reply