clw5180 / remote_sensing_object_detection_2019

2019年遥感图像稀疏表征与智能分析竞赛-初赛排名26/605
MIT License
63 stars 24 forks source link

train_net.py训练报错 #23

Open BlackHandguy opened 4 years ago

BlackHandguy commented 4 years ago

运行train_net.py训练时,出现以下情况: 2020-04-29 10:03:45,343 maskrcnn_benchmark INFO: Using 1 GPUs 2020-04-29 10:03:45,343 maskrcnn_benchmark INFO: Namespace(config_file='../configs/rrpn/e2e_rrpn_X_101_32x8d_FPN_1x_DOTA.yaml', distributed=False, local_rank=0, opts=[], skip_test=False) 2020-04-29 10:03:45,343 maskrcnn_benchmark INFO: Collecting env info (might take some time) 2020-04-29 10:03:46,259 maskrcnn_benchmark INFO: PyTorch version: 1.0.0.dev20190328 Is debug build: No CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.4 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CMake version: Could not collect

Python version: 3.6 Is CUDA available: No CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 440.44 cuDNN version: Could not collect

Versions of relevant libraries: [pip] numpy==1.18.3 [pip] torch==1.4.0 [pip] torchvision==0.2.1 [conda] mkl 2020.0 166 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge [conda] numpy 1.13.1 py36_nomkl_0 [nomkl] https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free [conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch [conda] pytorch-nightly 1.0.0.dev20190328 py3.6_cuda10.0.130_cudnn7.4.2_0 pytorch [conda] scipy 0.19.1 np113py36_nomkl_0 [nomkl] https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free [conda] torchvision 0.2.1 py_2 pytorch Pillow (4.2.1) 2020-04-29 10:03:46,260 maskrcnn_benchmark INFO: Loaded configuration file ../configs/rrpn/e2e_rrpn_X_101_32x8d_FPN_1x_DOTA.yaml 2020-04-29 10:03:46,260 maskrcnn_benchmark INFO: INPUT: MIN_SIZE_TRAIN: (800,) # TODO:关注一下输入图片resize的处理方式;个人感觉这样设置不会resize,不resize效果好一些。 MAX_SIZE_TRAIN: 800 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 800

PIXEL_STD: [0.225, 0.224, 0.229] # TODO:defaults.py为[1., 1., 1.]

TO_BGR255: False

DATASETS:

TRAIN: ("DOTA_train", )

TRAIN: ("RRPN_train", ) .............. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1553749776822/work/aten/src/THC/THCGeneral.cpp line=51 error=30 : unknown error Traceback (most recent call last): File "train_net.py", line 175, in main() File "train_net.py", line 168, in main model = train(cfg, args.local_rank, args.distributed) File "train_net.py", line 32, in train model.to(device) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 384, in to return self._apply(convert) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 190, in _apply module._apply(fn) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 190, in _apply module._apply(fn) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 190, in _apply module._apply(fn) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 196, in _apply param.data = fn(param.data) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 382, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) File "/home/imut-radar/anaconda2/envs/rrpn_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 163, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch-nightly_1553749776822/work/aten/src/THC/THCGeneral.cpp:51 `` 是cuda出现了什么问题吗?我是严格按照作者要求安装的10.0的。

BlackHandguy commented 4 years ago

使用cat /usr/local/cuda/version.txt查询到的cuda版本是10.0,使用nvcc-v查询到的cuda版本是9.1,不一致。