cha15yq / CUT

Segmentation assisted U-shaped multi-scale transformer for crowd counting
17 stars 2 forks source link

CUT

Segmentation assisted U-shaped multi-scale transformer for crowd counting avatar

Eniviroment

timm==0.5.4
python < 3.10
pytorch >=1.4
opencv-python
scipy==1.6.2
h5py
pillow
tqdm

Models

SHA can be trained on Google Colab, apart from the hyper-parameters stated in the paper. We train a total of 500 epochs and start evalutaion after the 120th epoch. If you want a similiar result, try seed=15. Don't forget to install timm==0.5.4. The higher version will cause an error.

Regards to other datasets, the model needs to be trained using a GPU with at least 24GB memory. More epochs are needed for a model to converge. We provide pre-trained models for them here.
SHA QNRF JHU++

The pretrained backbone model is provided here: PcPvT
You can also download it from Official

Results

avatar

If you find our work useful, please cite our paper:

@inproceedings{cut,
title={Segmentation Assisted U-shaped Multi-scale Transformer for Crowd Counting},
author={Yifei Qian and Liangfei Zhang and Xiaopeng Hong and Carl Donovan and Ognjen Arandjelovic},
booktitle={2022 British Machine Vision Conference},
year={2022},
}

Acknowledgement

@InProceedings{Rong_2021_WACV,
    author    = {Rong, Liangzi and Li, Chunping},
    title     = {Coarse- and Fine-Grained Attention Network With Background-Aware Loss for Crowd Density Map Estimation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2021},
    pages     = {3675-3684}
}
@inproceedings{chu2021Twins,
    title={Twins: Revisiting the Design of Spatial Attention in Vision Transformers},
    author={Xiangxiang Chu and Zhi Tian and Yuqing Wang and Bo Zhang and Haibing Ren and Xiaolin Wei and Huaxia Xia and Chunhua Shen},
    booktitle={NeurIPS 2021},
    year={2021}
}