We propose a modification to the global average pooling called spatial Re-Scaling which shows a consistent improvement in the generic classfication tasks. Currently the experiments are only conducted on the Person-ReID tasks (which is formulated into a fine-grained classification problem). Our code is mainly based on PCB.
If you find this code useful for your research, please consider citing the following papers:
@ARTICLE{9229188,
author={H. {Wang} and L. {Jiao} and S. {Yang} and L. {Li} and Z. {Wang}},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Simple and Effective: Spatial Rescaling for Person Reidentification},
year={2020},
volume={},
number={},
pages={1-12},
doi={10.1109/TNNLS.2020.3027589}
}
@article{wang2018parameter,
title={Parameter-free spatial attention network for person re-identification},
author={Wang, Haoran and Fan, Yue and Wang, Zexin and Jiao, Licheng and Schiele, Bernt},
journal={arXiv preprint arXiv:1811.12150},
year={2018}
}
The proposed architecture formulates the task as a classification. It consists of four components. The yellow region represents the backbone feature extractor. The red region represents the deeply supervised branches (DS). The blue region represents six part classifiers (P). The two green region represents two sets of spatial attention layers (SA), SA1 is not used for the main results. It only appears in the ablation study. Then the total loss is the summation over all deep supervision losses, six part losses and the loss from the backbone. Note that the spatial attention is only added before GAP.
Prerequisite: Python 2.7 and Pytorch 0.4.0(we run the code under version 0.4.0, maybe versions <= 0.4.0 also work.)
Market-1501 (password: 1ri5)
if you are going to train on the dataset of market-1501, run training:
python2 main.py -d market -b 48 -j 4 --epochs 100 --log logs/market/ --combine-trainval --step-size 40 --data-dir Market-1501
also, you can just download a trained weight file from BaiduYun (password: wwjv), and put it into model folder, which should be like 'model/checkpoint.pth.tar', then run testing:
python2 main.py -d market -b 48 -j 4 --log logs/market/ --combine-trainval --step-size 40 --data-dir Market-1501 --resume ./model/checkpoint.pth.tar --evaluate
We achieved the state-of-the-art on four benchmarks as is shown in Table 1 (11. Nov. 2018).
Here we show 6 examples to compare the class activation maps (CAM) of plain GAP and GAP with SA. From left to right are the original image, the CAM from plain GAP and the CAM from GAP with SA.
We see that the highlighted area from plain GAP is always concentrated to some parts of the object, which may suffer from the absence of that feature due to some occlusion and view point changing. With the help of the spatial attention, the focus of the model is distributed all over the image, providing the classifier more details of the object, which increases the model robustness.
In order to demonstrate the effectiveness of the spatial attention layer. We are now working on more examples for the ablation study. Each example inside the folder ablation is independent of the rest of the snippets.
Besides the ones in the paper, we uploaded another example for the ablation of the SA for the backbone model on Market-1501. Random erasing is cut off for the simplicity. The training epoch is set as 60.
python2 main.py -d market -b 48 -j 4 --epochs 60 --log logs/market/ --feature 256 --height 384 --width 128 --combine-trainval --step-size 40 --data-dir Market-1501