[Project page] [paper]
An official implementation of "An end to end transformer model for crowd localization" (Accepted by ECCV 2022).
python ==3.6
pytorch ==1.80
opencv-python
scipy
h5py
pillow
imageio
nni
mmcv
tensorboard
cd CLTR/data
For JHU-Crowd++ dataset: python prepare_jhu.py --data_path /xxx/xxx/jhu_crowd_v2.0
For NWPU-Crowd dataset: python prepare_nwpu.py --data_path /xxx/xxx/NWPU_CLTR
cd CLTR
python make_npydata.py --jhu_path /xxx/xxx/jhu_crowd_v2.0 --nwpu_path /xxx/xxx/NWPU_CLTR
Example (some hyper-parameters may be different from the original paper):
cd CLTR
sh experiments/jhu.sh
or
sh experiments/nwpu.sh
nproc_per_node
and gpu_id
of jhu.sh/nwpu.sh
, if you do not have enogh GPU. CLTR/save_file/log_file
Here we give the comparison. | NWPU-Crowd (val set) | MAE | MSE |
---|---|---|---|
Original paper | 61.9 | 246.3 | |
This repo (training log) | 51.3 | 116.7 |
Example:
python test.py --dataset jhu --pre model.pth --gpu_id 2,3
or
python test.py --dataset nwpu --pre model.pth --gpu_id 0,1
Example:
python video_demo.py --video_path ./video_demo/demo.mp4 --num_queries 700 --pre video_model.pth
"video_model.pth"
(trained from NWPU-Crowd training set) can be downloaded from Baidu disk, password: rw6b or google drive. "out_video.avi"
Visiting bilibili or Youtube to watch the video demo.
Thanks for the following great work:
@inproceedings{carion2020end,
title={End-to-end object detection with transformers},
author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
booktitle={European conference on computer vision},
pages={213--229},
year={2020},
organization={Springer}
}
@inproceedings{meng2021conditional,
title={Conditional detr for fast training convergence},
author={Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={3651--3660},
year={2021}
}
If you find this project is useful, please cite:
@article{liang2022end,
title={An end-to-end transformer model for crowd localization},
author={Liang, Dingkang and Xu, Wei and Bai, Xiang},
journal={European Conference on Computer Vision},
year={2022}
}