TencentYoutuResearch / CrowdCounting-P2PNet

The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"
Other
437 stars 116 forks source link

Facing issues in collate_fn_crowd(batch) function of util.misc.py #82

Open abinashlingank opened 5 months ago

abinashlingank commented 5 months ago

I want to train p2pnet with my custom dataset. While training, I am facing an issue in the collate_fn_crowd(batch) function:

def collate_fn_crowd(batch):
    # re-organize the batch
    batch_new = []
    for b in batch:
        imgs, points = b
        if imgs.ndim == 3:
            imgs = imgs.unsqueeze(0)
        for i in range(len(imgs)):
            # if len(points) > 0:
            #     batch_new.append((imgs[i, :, :, :], points[i]))
            batch_new.append((imgs[i, :, :, :], points[i]))
    batch = batch_new
    batch = list(zip(*batch))
    batch[0] = nested_tensor_from_tensor_list(batch[0])
    return tuple(batch)

in util.misc.py file. The error is of the following.

 CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT     --dataset_file SHHA     --epochs 3500     --lr_drop 3500     --output_dir ./logs     --checkpoints_dir ./weights     --tensorboard_dir ./logs     --lr 0.0001     --lr_backbone 0.00001     --batch_size 8     --eval_freq 1     --gpu_id 0 --frozen_weights /home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth
Frozen training
Namespace(backbone='vgg16_bn', batch_size=8, checkpoints_dir='./weights', clip_max_norm=0.1, data_root='/home/abi/p2p-training/CrowdCounting-P2PNet/p2p', dataset_file='SHHA', eos_coef=0.5, epochs=3500, eval=False, eval_freq=1, frozen_weights='/home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth', gpu_id=0, line=2, lr=0.0001, lr_backbone=1e-05, lr_drop=3500, num_workers=8, output_dir='./logs', point_loss_coef=0.0002, resume='', row=2, seed=42, set_cost_class=1, set_cost_point=0.05, start_epoch=0, tensorboard_dir='./logs', weight_decay=0.0001)
number of params: 21579344
Start training
Traceback (most recent call last):
  File "train.py", line 223, in <module>
    main(args)
  File "train.py", line 162, in main
    args.clip_max_norm)
  File "/home/abi/p2p-training/CrowdCounting-P2PNet/engine.py", line 85, in train_one_epoch
    for samples, targets in data_loader:
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/abi/p2p-training/CrowdCounting-P2PNet/util/misc.py", line 311, in collate_fn_crowd
    batch_new.append((imgs[i, :, :, :], points[i]))
IndexError: list index out of range

for solving this I have added if condition to process only points whose length > 0. After done this it gives RuntimeError: CUDA error: no kernel image is available for execution on the device

What is the issue?

nice98k commented 5 months ago

你这个,是遇到了,他裁剪那部分没有groundtruth,坐着代码没有考虑到这种情况,你把错误结果问问gpt,能解决,我遇到过,解决了