HRanWang / Spatial-Re-Scaling

MIT License
130 stars 37 forks source link

Questions about the ablation experiment of Spatial-Attention? #1

Open Hellomodo opened 5 years ago

Hellomodo commented 5 years ago

Hi @HRanWang , thanks for your wonderful work! I try to do a simple ablation experiment to validate the performance of Spatial-Attention, but the result confuses me a lot. For the sake of simplicity, I change the code bellow as a simple baseline. ` # torch.autograd.backward([ loss0, loss1, loss2, loss3, loss4, loss5, loss6,loss_layer1,loss_layer2,loss_layer3],

[torch.ones(1).cuda(), torch.ones(1).cuda(), torch.ones(1).cuda(),torch.ones(1).cuda(),torch.ones(1).cuda(),

        #                          torch.ones(1).cuda(), torch.ones(1).cuda(), torch.ones(1).cuda(),torch.ones(1).cuda(),torch.ones(1).cuda()])`

change to ---> loss6.backward()

And to validate the performance of Spatial-Attention, I simply add x_attn0 = self.SA4(x) x = x*x_attn0 After line 174 in reSAnet.py.

For the baseline, I get the result: CMC Scores allshots cuhk03 market1501 top-1 55.3% 76.7% 89.6% top-5 70.7% 92.5% 95.9% top-10 77.1% 95.5% 97.5% when add the Spatial-Attention to this baseline, I get the result: CMC Scores allshots cuhk03 market1501 top-1 51.8% 73.4% 88.6% top-5 67.2% 90.4% 95.4% top-10 73.7% 94.0% 97.1%

I do the experiment in python3.6 and pytorch0.4.1.

It seems that the Spatial-Attention doesn't help improve the performance. Correct me if I ignore something.

YUE-FAN commented 5 years ago

Hi,

You env setting is fine. Actually the problem I think is where you add x_attn0 = self.SA4(x) x = x*x_attn0. @HRanWang provided an example in /ablation/Backbone_with(out)_SA1/reid/models/resnet_fusion.py.

If the SA4 is inserted at line 174, it will affect the losses from the part-level features. Since those features are partitioned into a few chunks, the intention of SA will be in vain. This is also why we didn't use SA after stage 4 of the backbone at the first place. The correct way to "ablate" the SA 4 is to get rid of the part-level losses before applying SA to the GAP :)

PS: actually SA is designed as an improved version of GAP. If you use it onto the GAP, the performance should get improved consistently. But if you generalize it to other layers, it sometimes does hurt the model. (We are now working on this generalization, hopefully it will be released next month :P)

Hellomodo commented 5 years ago

Hi @YUE-FAN . Thanks for your quickly and detailed reply!

  1. Noted that the in the two experiment of this issue, only loss6 is backward, so the part loss and deeply supervised branches don't work in this setting indeed. 2.In the resnet_fusion.py of Backbone_with(out)_SA1, to demonstrate the effectiveness of the spatial attention layer, should I simply add x_attn4 = self.soft4(x) x = x*x_attn4 after line 202? https://github.com/HRanWang/Spatial-Attention/blob/72dd4f353a7f8f047904dc278ded51d723a87be6/ablation/Backbone_with(out)_SA1/reid/models/resnet_fusion.py#L202 Since only the backbone loss is used, why the partition operation still be applied?

Thanks again for the ablation code.

Hellomodo commented 5 years ago

I just test the code in Backbone_with(out)_SA1 in python2.7 and pytorch0.4.0.

For the origional code in resnet_fusion.py, which is regarded as a Backbone_without_SA1: Mean AP: 70.5% CMC Scores allshots cuhk03 market1501 top-1 49.2% 72.7% 88.2% top-5 65.3% 90.6% 95.5% top-10 72.2% 94.3% 97.0%

Then I simply uncomment the following lines to add a SA1. https://github.com/HRanWang/Spatial-Attention/blob/72dd4f353a7f8f047904dc278ded51d723a87be6/ablation/Backbone_with(out)_SA1/reid/models/resnet_fusion.py#L203-L204 Mean AP: 69.5% CMC Scores allshots cuhk03 market1501 top-1 48.3% 71.5% 86.8% top-5 64.3% 90.1% 95.5% top-10 71.3% 93.9% 97.0%