I want to train p2pnet with my custom dataset. While training, I am facing an issue in the collate_fn_crowd(batch) function:
def collate_fn_crowd(batch):
# re-organize the batch
batch_new = []
for b in batch:
imgs, points = b
if imgs.ndim == 3:
imgs = imgs.unsqueeze(0)
for i in range(len(imgs)):
# if len(points) > 0:
# batch_new.append((imgs[i, :, :, :], points[i]))
batch_new.append((imgs[i, :, :, :], points[i]))
batch = batch_new
batch = list(zip(*batch))
batch[0] = nested_tensor_from_tensor_list(batch[0])
return tuple(batch)
in util.misc.py file. The error is of the following.
CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT --dataset_file SHHA --epochs 3500 --lr_drop 3500 --output_dir ./logs --checkpoints_dir ./weights --tensorboard_dir ./logs --lr 0.0001 --lr_backbone 0.00001 --batch_size 8 --eval_freq 1 --gpu_id 0 --frozen_weights /home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth
Frozen training
Namespace(backbone='vgg16_bn', batch_size=8, checkpoints_dir='./weights', clip_max_norm=0.1, data_root='/home/abi/p2p-training/CrowdCounting-P2PNet/p2p', dataset_file='SHHA', eos_coef=0.5, epochs=3500, eval=False, eval_freq=1, frozen_weights='/home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth', gpu_id=0, line=2, lr=0.0001, lr_backbone=1e-05, lr_drop=3500, num_workers=8, output_dir='./logs', point_loss_coef=0.0002, resume='', row=2, seed=42, set_cost_class=1, set_cost_point=0.05, start_epoch=0, tensorboard_dir='./logs', weight_decay=0.0001)
number of params: 21579344
Start training
Traceback (most recent call last):
File "train.py", line 223, in <module>
main(args)
File "train.py", line 162, in main
args.clip_max_norm)
File "/home/abi/p2p-training/CrowdCounting-P2PNet/engine.py", line 85, in train_one_epoch
for samples, targets in data_loader:
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/abi/p2p-training/CrowdCounting-P2PNet/util/misc.py", line 311, in collate_fn_crowd
batch_new.append((imgs[i, :, :, :], points[i]))
IndexError: list index out of range
for solving this I have added if condition to process only points whose length > 0.
After done this it gives
RuntimeError: CUDA error: no kernel image is available for execution on the device
I want to train p2pnet with my custom dataset. While training, I am facing an issue in the collate_fn_crowd(batch) function:
in util.misc.py file. The error is of the following.
for solving this I have added if condition to process only points whose length > 0. After done this it gives
RuntimeError: CUDA error: no kernel image is available for execution on the device
What is the issue?