Open MrJoratos opened 9 months ago
and the yolov8 is old version
it's normal when running validation for the first two steps(although there is no gpu memory occupied by python), when coming to training, this error abbrupts
@MrJoratos Hi, have you solved the problem?
I also encountered this problem, my task is to detect, when I use the official yolov8n's dataset, I can prune and post-train normally, but when I use my own trained dataset to fetch the pruning pruning normally, when I can't post-train it, it reports the error: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!, and don't understand the logic of this! Woohoo!
the error is as following: albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/assignment10_19/data_original/4kp_data/labeled/rgb/南航.cache... 704 images, 1 Plotting labels to runs/pose/step_0_finetune11/labels.jpg... optimizer: AdamW(lr=0.000476, momentum=0.9) with parameter groups 63 weight(decay=0.0), 83 weight(decay=0.0005), 82 bias(decay=0.0) Image sizes 928 train, 928 val Using 8 dataloader workers Logging results to runs/pose/step_0_finetune11 Starting training for 10 epochs... Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
0%| | 0/1329 [00:00<?, ?it/s] Traceback (most recent call last): File "torch-Pruning.py", line 403, in
prune(args)
File "torch-Pruning.py", line 359, in prune
model.train_v2(pruning=True, pruning_cfg)
File "torch-Pruning.py", line 267, in train_v2
self.trainer.train()
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 192, in train
self._do_train(world_size)
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 332, in _do_train
self.loss, self.loss_items = self.model(batch)
File "/home/hitcrt/anaconda3/envs/py381/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 44, in forward
return self.loss(x, args, kwargs)
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 215, in loss
return self.criterion(preds, batch)
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 335, in call
pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4)
File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 150, in bbox_decode
pred_dist = pred_dist.view(b, a, 4, c // 4).softmax(3).matmul(self.proj.type(pred_dist.dtype))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)