dk-liang / CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization
MIT License
92 stars 13 forks source link

CLTR (Crowd Localization TRansformer)

[Project page] [paper]

An official implementation of "An end to end transformer model for crowd localization" (Accepted by ECCV 2022).

Environment

python ==3.6
pytorch ==1.80
opencv-python
scipy
h5py
pillow
imageio
nni
mmcv
tensorboard

Datasets

Prepare data

Generate point map

cd CLTR/data
For JHU-Crowd++ dataset: python prepare_jhu.py --data_path /xxx/xxx/jhu_crowd_v2.0
For NWPU-Crowd dataset: python prepare_nwpu.py --data_path /xxx/xxx/NWPU_CLTR

Generate image list

cd CLTR
python make_npydata.py --jhu_path /xxx/xxx/jhu_crowd_v2.0 --nwpu_path /xxx/xxx/NWPU_CLTR

Training

Example (some hyper-parameters may be different from the original paper):
cd CLTR
sh experiments/jhu.sh
or
sh experiments/nwpu.sh

Here we give the comparison. NWPU-Crowd (val set) MAE MSE
Original paper 61.9 246.3
This repo (training log) 51.3 116.7

Testing

Example:
python test.py --dataset jhu --pre model.pth --gpu_id 2,3
or
python test.py --dataset nwpu --pre model.pth --gpu_id 0,1

Video Demo

Example:
python video_demo.py --video_path ./video_demo/demo.mp4 --num_queries 700 --pre video_model.pth

avatar

Visiting bilibili or Youtube to watch the video demo.

Acknowledgement

Thanks for the following great work:

@inproceedings{carion2020end,
  title={End-to-end object detection with transformers},
  author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
  booktitle={European conference on computer vision},
  pages={213--229},
  year={2020},
  organization={Springer}
}
@inproceedings{meng2021conditional,
  title={Conditional detr for fast training convergence},
  author={Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3651--3660},
  year={2021}
}

Reference

If you find this project is useful, please cite:

@article{liang2022end,
  title={An end-to-end transformer model for crowd localization},
  author={Liang, Dingkang and Xu, Wei and Bai, Xiang},
  journal={European Conference on Computer Vision},
  year={2022}
}