cxliu0 / PET

[ICCV 2023] Point-Query Quadtree for Crowd Counting, Localization, and More
MIT License
56 stars 5 forks source link

Other datasets #10

Open little-seasalt opened 9 months ago

little-seasalt commented 9 months ago

Hello, I would like to ask, if I want to retrain the models on the UCF and JHU datasets, what changes need to be made to the existing code?

cxliu0 commented 9 months ago

In general, you need to customize the dataloader and preprocess data for each dataset.

little-seasalt commented 9 months ago

一般来说,您需要自定义数据加载器并预处理每个数据集的数据。

  • 为每个数据集自定义数据加载器(请参阅SHA.py)并将其添加到datasets/ init .py
  • 预处理数据集,例如调整图像和地面实况点的大小。这可以节省数据加载时间。
  • 关于数据增强,您可以尝试在不进行尺度增强的情况下训练模型,或者调整尺度增强参数。

Thank you for your answer.

little-seasalt commented 8 months ago

Hello author: May I ask what kind of graphics card you used when training these data sets (UCF-QNRF, JHU-Crowd++ and NWPU-Crowd)? I frequently ran out of video memory during the training process. Especially when it occurs during eval, is there any way to solve this problem?

cxliu0 commented 8 months ago

Typically, NVIDIA RTX 3090 is sufficient to train the model. Regarding CUDA out of memory, you may try to reduce the batch size and use parallel training.

little-seasalt commented 8 months ago

Typically, NVIDIA RTX 3090 is sufficient to train the model. Regarding CUDA out of memory, you may try to reduce the batch size and use parallel training.

In the process of training the UCF-QNRF dataset, I found that it takes about 6 minutes to train for one round and two minutes for each evaluation. Is this time consumption normal? I would like to ask the author how much time you spent on training at that time.

cxliu0 commented 8 months ago

We suggest preprocessing the UCF-QNRF dataset before training, because loading the original images during training is time-consuming. After preprocessing, one epoch will take less than 40 seconds if you use two NVIDIA RTX 3090 for training.

little-seasalt commented 8 months ago

We suggest preprocessing the UCF-QNRF dataset before training, because loading the original images during training is time-consuming. After preprocessing, one epoch will take less than 40 seconds if you use two NVIDIA RTX 3090 for training.

I have processed the UCF-QNRF dataset according to the operations mentioned in the paper, that is, limiting the long sides to 1536 pixels, and processing the size of both image and ground-truth points. The other parts of the data loader are written with reference to SHA.py. Are there any other data preprocessing operations that I have missed?

cxliu0 commented 8 months ago

You shall resize the images and ground-truth points, and then save the preprocessed data. After that, you can use the preprocessed data to train the model. Resizing images on the fly is time-consuming.

Aksheit-Saxena commented 8 months ago

Can anyone share the JHU.py file , i m getting dimension mismatches when train.sh is run Any insight is appreciated

little-seasalt commented 7 months ago

Can anyone share the JHU.py file , i m getting dimension mismatches when train.sh is run Any insight is appreciated

Have you reproduced the paper metrics of the UCF-QNRF dataset? Perhaps you would like to share the relevant code?