运行train.py遇到了问题

wzp2019201645 commented 2 years ago

你好，我在训练自己的数据集之前，按照你的要求更改了相应的地方。但是当我运行train.py之后，出现了下面的bug。我看不太懂，请问这是什么原因造成的？ initialize network with normal type Epoch 1/25: 0%| | 0/1665 [00:00<?, ?it/s<class 'dict'>]Start Train Epoch 1/25: 0%| | 2/1665 [00:03<45:23, 1.64s/it, lr=0.0001, roi_cls=nan, roi_loc=nan, rpn_cls=6.98e+28, rpn_loc=5.37e+29, total_loss=nan] C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [2,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Epoch 1/25: 0%| | 2/1665 [00:04<57:37, 2.08s/it, lr=0.0001, roi_cls=nan, roi_loc=nan, rpn_cls=6.98e+28, rpn_loc=5.37e+29, total_loss=nan] Traceback (most recent call last): File "E:/WangZhongpeng/adversarial_defence/faster-rcnn-pytorch-master/train.py", line 212, in fit_one_epoch(model, train_util, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\utils\utils_fit.py", line 25, in fit_one_epoch rpn_loc, rpn_cls, roi_loc, roi_cls, total = train_util.train_step(images, boxes, labels, 1) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 325, in train_step losses = self.forward(imgs, bboxes, labels, scale) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 311, in forward roi_loc_loss = self._fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label.data, self.roi_sigma) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 221, in _fast_rcnn_loc_loss pred_loc = pred_loc[gt_label > 0] RuntimeError: CUDA error: device-side assert triggered

bubbliiiing commented 2 years ago

txt可能没改对

swlwhut commented 2 years ago

报错 Message=Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\复现算法\faster-rcnn-pytorch-master\utils\dataloader.py", line 25, in getitem image, y = self.get_random_data(self.annotation_lines[index], self.input_shape[0:2], random = self.train) File "D:\复现算法\faster-rcnn-pytorch-master\utils\dataloader.py", line 43, in get_random_data image = Image.open(line[0]) File "D:\anaconda\envs\pytorch\lib\site-packages\PIL\Image.py", line 2912, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'D:\澶嶇幇绠楁硶\faster-rcnn-pytorch-master\VOCdevkit/VOC2007/JPEGImages/009307.jpg' Source= StackTrace: File "D:\复现算法\faster-rcnn-pytorch-master\utils\utils_fit.py", line 38, in fit_one_epoch pbar.update(1) File "D:\复现算法\faster-rcnn-pytorch-master\train.py", line 211, in fit_one_epoch(model, train_util, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda)

您好，遇到了这样的情况 initialize network with normal type

Load weights model_data/voc_weights_resnet.pth.

线程 0x3 已退出，返回值为 0 (0x0)。线程 0x2 已退出，返回值为 0 (0x0)。 Start Train

Epoch 1/50: 0%| | 0/4137 [00:00<?, ?it/s<class 'dict'>]

Epoch 1/50: 0%| | 0/4137 [00:06<?, ?it/s<class 'dict'>]

Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\复现算法\faster-rcnn-pytorch-master\utils\dataloader.py", line 25, in getitem image, y = self.get_random_data(self.annotation_lines[index], self.input_shape[0:2], random = self.train) File "D:\复现算法\faster-rcnn-pytorch-master\utils\dataloader.py", line 43, in get_random_data image = Image.open(line[0]) File "D:\anaconda\envs\pytorch\lib\site-packages\PIL\Image.py", line 2912, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'D:\澶嶇幇绠楁硶\faster-rcnn-pytorch-master\VOCdevkit/VOC2007/JPEGImages/009307.jpg' 堆栈跟踪:

File "D:\复现算法\faster-rcnn-pytorch-master\utils\utils_fit.py", line 38, in fit_one_epoch pbar.update(1) File "D:\复现算法\faster-rcnn-pytorch-master\train.py", line 211, in fit_one_epoch(model, train_util, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda) 已加载“utils.utils_fit” 已加载“main”

bubbliiiing commented 2 years ago

中文

2915482452 commented 3 months ago

你好，我在训练自己的数据集之前，按照你的要求更改了相应的地方。但是当我运行train.py之后，出现了下面的bug。我看不太懂，请问这是什么原因造成的？ initialize network with normal type Epoch 1/25: 0%| | 0/1665 [00:00<?, ?it/s<class 'dict'>]Start Train Epoch 1/25: 0%| | 2/1665 [00:03<45:23, 1.64s/it, lr=0.0001, roi_cls=nan, roi_loc=nan, rpn_cls=6.98e+28, rpn_loc=5.37e+29, total_loss=nan] C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [2,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Epoch 1/25: 0%| | 2/1665 [00:04<57:37, 2.08s/it, lr=0.0001, roi_cls=nan, roi_loc=nan, rpn_cls=6.98e+28, rpn_loc=5.37e+29, total_loss=nan] Traceback (most recent call last): File "E:/WangZhongpeng/adversarial_defence/faster-rcnn-pytorch-master/train.py", line 212, in fit_one_epoch(model, train_util, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, end_epoch, Cuda) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\utils\utils_fit.py", line 25, in fit_one_epoch rpn_loc, rpn_cls, roi_loc, roi_cls, total = train_util.train_step(images, boxes, labels, 1) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 325, in train_step losses = self.forward(imgs, bboxes, labels, scale) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 311, in forward roi_loc_loss = self._fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label.data, self.roi_sigma) File "E:\WangZhongpeng\adversarial_defence\faster-rcnn-pytorch-master\nets\frcnn_training.py", line 221, in _fast_rcnn_loc_loss pred_loc = pred_loc[gt_label > 0] RuntimeError: CUDA error: device-side assert triggered

这个问题我也遇到了，原因是annotation.txt中每个类别应该从0开始编号，在训练过程中代码写了一个label+1逻辑，将编号0留出来作为背景，所以annotation.txt应该从0开始编号，否则label+1后会有类别超出范围

bubbliiiing / faster-rcnn-pytorch

运行train.py遇到了问题 #86