Closed jiahhhao closed 3 months ago
你好,感谢关注。在这个网络结构脚本的主函数中,由于要测试模型的计算量和参数量,因此将batch size设为1,不然测试的工具函数会报错
感谢回答!我在尝试将IGAB模块加入别的模型时,也是出现了同样的问题。 我应该从哪里找问题呢?
什么问题,截图看看
抱歉这么晚回复。 下面为在您模型基础上进行了一些删除:
然后加到yolov4的输入之前:
开始训练就会出现下面的错误: ![Uploading 屏幕截图 2024-06-13 155935.png…]()
我应该怎么寻找错误的原因呢?感谢回答!
抱歉,似乎刚才错误的图片上传失败
Start Train
Epoch 1/300: 0%| | 0/1490 [00:00<?, ?it/s<class 'dict'>Traceback (most recent call last):
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/train.py", line 563, in <module>
fit_one_epoch(model_train, model, yolo_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank)
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/utils/utils_fit.py", line 34, in fit_one_epoch
outputs = model_train(images)
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/nets/yolo.py", line 136, in forward
x2, x1, x0 = self.backbone(x)
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/nets/CSPdarknet.py", line 160, in forward
x = self.conv1(x)
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/nets/CSPdarknet.py", line 34, in forward
x = self.activation(x)
File "/home/zjh/miniconda3/envs/Retinexformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zjh/codeSpace/python/Paper/yolov4-pytorch/nets/CSPdarknet.py", line 17, in forward
return x * torch.tanh(F.softplus(x))
RuntimeError: CUDA error: unknown error
这个应该是你环境没有装对,或者 GPU 驱动没装好或者不兼容
如果觉得我们的 repo 有用的话,可以帮忙 fork 支持一下吗,感谢
好的,我重新配个环境试试,谢谢!
您好,环境是按照您给的一模一样配置的。但在basicsr/models/archs/RetinexFormer_arch.py文件中,如果inputs = torch.randn((1, 3, 416, 416)).cuda()按照这个大小将没有问题;如果稍微增加一点inputs = torch.randn((2, 3, 416, 416)).cuda()则会出现RuntimeError: CUDA error: unknown error。 这个问题是怎么引发的呢,我应该从哪些地方开始排查? 感谢回答!