This repository includes the official implementation of the paper:
Point-Query Quadtree for Crowd Counting, Localization, and More
International Conference on Computer Vision (ICCV), 2023
Chengxin Liu1, Hao Lu1, Zhiguo Cao1, Tongliang Liu2
1Huazhong University of Science and Technology, China
2The University of Sydney, Australia
We formulate crowd counting as a decomposable point querying process, where sparse input points could split into four new points when necessary. This formulation exhibits many appealing properties:
Intuitive: The input and output are both interpretable and steerable
Generic: PET is applicable to a number of crowd-related tasks, by simply adjusting the input format
Effective: PET reports state-of-the-art crowd counting and localization results
torch
torchvision
numpy
opencv-python
scipy
matplotlib
pip install -r requirements.txt
Download crowd-counting datasets, e.g., ShanghaiTech.
We expect the directory structure to be as follows:
PET
├── data
│ ├── ShanghaiTech
├── datasets
├── models
├── ...
Download ImageNet pretrained vgg16_bn, and put it in pretrained
folder. Or you can define your pre-trained model path in models/backbones/vgg.py
To train PET on ShanghaiTech PartA, run
sh train.sh
--resume
to your local model pathsh eval.sh
Environment:
python==3.8
pytorch==1.12.1
torchvision==0.13.1
Models:
Dataset | Model Link | Training Log | MAE |
---|---|---|---|
ShanghaiTech PartA | SHA_model.pth | SHA_log.txt | 49.08 |
ShanghaiTech PartB | SHB_model.pth | SHB_log.txt | 6.18 |
If you find this work helpful for your research, please consider citing:
@InProceedings{liu2023pet,
title={Point-Query Quadtree for Crowd Counting, Localization, and More},
author={Liu, Chengxin and Lu, Hao and Cao, Zhiguo and Liu, Tongliang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}
This code is for academic purposes only. Contact: Chengxin Liu (cx_liu@hust.edu.cn)
We thank the authors of DETR and P2PNet for open-sourcing their work.