MendelXu / ANN

semantic segmentation,pytorch,non-local
Apache License 2.0
312 stars 63 forks source link

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. #9

Closed karlcao closed 4 years ago

karlcao commented 4 years ago

python 3.6.4 cuda 9.0 gcc 5.5.0 I meet problem when I run bash scripts/seg/cityscapes/run_fs_annn_cityscapes_seg.sh train tag the first error is subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. can you solve the problem?

Traceback (most recent call last): File "main.py", line 182, in <module> runner = method_selector.select_seg_method() File "/data/lumeng/ANN/methods/method_selector.py", line 104, in select_seg_method return SEG_METHOD_DICT[key](self.configer) File "/data/lumeng/ANN/methods/seg/fcn_segmentor.py", line 44, in __init__ self._init_model() File "/data/lumeng/ANN/methods/seg/fcn_segmentor.py", line 47, in _init_model self.seg_net = self.seg_model_manager.semantic_segmentor() File "/data/lumeng/ANN/models/seg/model_manager.py", line 38, in semantic_segmentor model = SEG_MODEL_DICT[model_name](self.configer) File "/data/lumeng/ANN/models/seg/nets/annn.py", line 17, in __init__ self.backbone = BackboneSelector(configer).get_backbone() File "/data/lumeng/ANN/models/backbones/backbone_selector.py", line 31, in get_backbone model = ResNetBackbone(self.configer)(**params) File "/data/lumeng/ANN/models/backbones/resnet/resnet_backbone.py", line 176, in __call__ orig_resnet = self.resnet_models.deepbase_resnet101() File "/data/lumeng/ANN/models/backbones/resnet/resnet_models.py", line 256, in deepbase_resnet101 norm_type=self.configer.get('network', 'norm_type'), **kwargs) File "/data/lumeng/ANN/models/backbones/resnet/resnet_models.py", line 107, in __init__ ('bn1', ModuleHelper.BatchNorm2d(norm_type=norm_type)(64)), File "/data/lumeng/ANN/models/tools/module_helper.py", line 89, in BatchNorm2d from encoding.nn import BatchNorm2d File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/__init__.py", line 13, in <module> from . import nn, functions, parallel, utils, models, datasets, transforms File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/nn/__init__.py", line 12, in <module> from .encoding import * File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in <module> from ..functions import scaled_l2, aggregate, pairwise_cosine File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/functions/__init__.py", line 2, in <module> from .encoding import * File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in <module> from .. import lib File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/lib/__init__.py", line 27, in <module> build_directory=gpu_path, verbose=False) File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 645, in load is_python_module) File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 814, in _jit_compile with_cuda=with_cuda) File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 863, in _write_ninja_file_and_build _build_extension_module(name, build_directory, verbose) File "/home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 959, in _build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'enclib_gpu': [1/7] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /home/lumeng/anaconda3/envs/pytorch3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/lib/gpu/nms_kernel.cu -o nms_kernel.cuda.o FAILED: nms_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /home/lumeng/anaconda3/envs/pytorch3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/lumeng/anaconda3/envs/pytorch3/lib/python3.6/site-packages/encoding/lib/gpu/nms_kernel.cu -o nms_kernel.cuda.o

MendelXu commented 4 years ago

Please refer to this. Or you can install Cuda-9.2 as described in README.

karlcao commented 4 years ago

I install cuda-9.2,but meet another problem after print DataParallelModel in run_fs_annn_cityscapes_seg.sh train tag `Traceback (most recent call last): File "main.py", line 197, in Controller.train(runner) File "/data/lumeng/ANN/methods/tools/controller.py", line 40, in train runner.train() File "/data/lumeng/ANN/methods/seg/fcn_segmentor.py", line 89, in train outputs = self.seg_net(inputs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/data/lumeng/ANN/models/seg/nets/annn.py", line 36, in forward x = self.backbone(x_) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/data/lumeng/ANN/models/backbones/resnet/resnet_backbone.py", line 94, in forward x = self.prefix(x) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 122, in forward self.activation, self.slope).view(input_shape) File "/home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/functions/syncbn.py", line 58, in forward _ex, _exs = lib.gpu.expectation_forward(x) RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:424, please report a bug to PyTorch. (Expectation_Forward_CUDA at /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:424) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7ff103b5f021 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7ff103b5e8ea in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: Expectation_Forward_CUDA(at::Tensor) + 0x281 (0x7ff08852d822 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #3: + 0x8a413 (0x7ff088506413 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #4: + 0x83698 (0x7ff0884ff698 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #5: + 0x7bfb7 (0x7ff0884f7fb7 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #6: + 0x7c123 (0x7ff0884f8123 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #7: + 0x6985c (0x7ff0884e585c in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)

frame #16: THPFunction_apply(_object*, _object*) + 0x581 (0x7ff104082ab1 in /home/lumeng/anaconda3/envs/ann/lib/python3.6/site-packages/torch/lib/libtorch_python.so) `