RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

NgheNhanTruong commented 10 months ago

When i conducted the training process, i have a trouble with evaluation on cuda, can you help me to sovle them. Below is the bug: (yolov5) D:\OneDrive - Sejong University\NgheNhan_Master\Research\Multicamera Fusion Method\Code\YOLOV5m>python train.py --data Mask_Dataset --box_format yolo --epochs 10 --bs 8

Training Logs will be saved in train_eval_metrics\model_1\loss.csv

Eval Logs will be saved in train_eval_metrics\model_1\eval.csv

Training epoch 1/11 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:59<00:00, 1.27it/s, average_loss_batches=34.4] ==> training_loss: 37.94 .. Computing: class and obj accuracies .. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:30<00:00, 1.38s/it] Class accuracy: 0.00% Obj accuracy: 100.00% .. Getting Evaluation bboxes to compute MAP.. 0%| | 0/22 [00:25<?, ?it/s] Traceback (most recent call last): File "D:\OneDrive - Sejong University\NgheNhan_Master\Research\Multicamera Fusion Method\Code\YOLOV5m\train.py", line 147, in main(parser) File "D:\OneDrive - Sejong University\NgheNhan_Master\Research\Multicamera Fusion Method\Code\YOLOV5m\train.py", line 130, in main evaluate.map_pr_rec(model.to(config.DEVICE), val_loader, anchors=model.head.anchors, epoch=epoch) File "D:\OneDrive - Sejong University\NgheNhan_Master\Research\Multicamera Fusion Method\Code\YOLOV5m\utils\validation_utils.py", line 103, in map_pr_rec pred_boxes = cells_to_bboxes(predictions, anchors, strides=model.head.stride, is_pred=True, to_list=False) File "D:\OneDrive - Sejong University\NgheNhan_Master\Research\Multicamera Fusion Method\Code\YOLOV5m\utils\plot_utils.py", line 25, in cells_to_bboxes xy = (2 (layer_prediction[..., 0:2]) + grid[i] - 0.5) stride RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

AlessandroMondin commented 10 months ago

Hi! It's been a while since I last used this repo and I do not have a GPU to test the error. I suggest you to put a breakpoint on the error line and check which tensor is not sent to cuda

xjyisok commented 5 months ago

May be you can set device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

AlessandroMondin / YOLOV5m

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #3

Training Logs will be saved in train_eval_metrics\model_1\loss.csv

Eval Logs will be saved in train_eval_metrics\model_1\eval.csv