Open EternalEvan opened 4 years ago
I meet the same problem. Do you find anything?
This might be because your evaluation dataset is large. It appears the evaluation dataset is evaluated on the CPU, though I'm not so sure - that would be one explanation for the slowness.
Try set dataloader num_workers=0
.
val_loader = DataLoader(val_dataset, batch_size=config.batch // config.subdivisions, shuffle=True, num_workers=0, pin_memory=True, drop_last=True, collate_fn=val_collate)
It seems like a bug in pytorch dataloder.
You need to change Pytorch version, I changed it to 1.5.0, and the train.py run successfully with GPU
@swxu @asebaq I was going crazy debugging. Thanks.
Does anyone encounter the situation that CPU can run but GPU will be stuck in the first epoch? Results can be obtained when training with CPU. But when I train my own data with GPU, I will be stuck here. Can someone help me? CPU: 2020-10-29 23:42:12,062 train.py[line:611] INFO: Using device cpu 2020-10-29 23:42:13,583 train.py[line:327] INFO: Starting training: Epochs: 5 Batch size: 4 Subdivisions: 1 Learning rate: 0.001 Training size: 21 Validation size: 4 Checkpoints: True Device: cpu Images size: 608 Optimizer: adam Dataset classes: 3 Train label path:train.txt Pretrained:
Epoch 1/5: 95%|▉| 20/21 [10:25<00:31, 31.65s/img]in function convert_to_coco_api... You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. creating index... index created! You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. Accumulating evaluation results... DONE (t=0.13s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000:
GPU: 2020-10-30 13:54:17,456 train.py[line:611] INFO: Using device cuda 2020-10-30 13:54:20,094 train.py[line:327] INFO: Starting training: Epochs: 5 Batch size: 4 Subdivisions: 1 Learning rate: 0.001 Training size: 21 Validation size: 4 Checkpoints: True Device: cuda Images size: 608 Optimizer: adam Dataset classes: 3 Train label path:train.txt Pretrained:
Epoch 1/5: 95%|▉| 20/21 [00:17<00:01, 1.01s/img]in function convert_to_coco_api... You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. You could also create your own 'get_image_id' function. creating index... index created!