HRNet / HRNet-Semantic-Segmentation

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Other
3.12k stars 685 forks source link

Regarding to the problem related to ninja... #33

Open frostinassiky opened 5 years ago

frostinassiky commented 5 years ago

Dear guys,

I also meet some issue about the ninja... Here is my understanding:

sunke123 commented 5 years ago

@Frostinassiky Thanks for you help!

zyxu1996 commented 5 years ago

@Frostinassiky Could you please give me a detailed description about the 2nd solution? I don't know what's that mean. Thank you

frostinassiky commented 5 years ago

@xu13521090631 This is my setup file.

from os import path

_src_path = path.join(path.dirname(path.abspath(__file__)), "src")

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='inplace_abn_cpp_backend',
    ext_modules=[
        CUDAExtension(
            name='inplace_abn_cpp_backend',
            sources=[
              "src/inplace_abn.cpp",
              "src/inplace_abn_cpu.cpp",
              "src/inplace_abn_cuda.cu"
            ],
            extra_compile_args = {
                "cxx":["-O3"],
                'nvcc': ['--expt-extended-lambda']
            }
        )
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

In the function.py file, I create a new _backend by import inplace_abn_cpp_backend as _backend

~As a third solution, if you are not using multiple GPUs, the PyTorch batch normalization layer works well.~ Ahha, there is a new brach pytorch-v1.1.

zyxu1996 commented 5 years ago

@Frostinassiky Thank you for your reply. There are still some troubles, I have done all these steps, but still get some errors as follows.I guess it is the version of cuda, pytorch, ninja don't match, could you please tell me the version of these packages. Thank you! `!! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6 for instructions on how to install GCC 4.9 or higher. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument Traceback (most recent call last): File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/tools/train.py", line 248, in main() File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/tools/train.py", line 83, in main logger.info(get_model_summary(model.cuda(), dump_input.cuda())) File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/lib/utils/modelsummary.py", line 90, in get_model_summary model(input_tensors) File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/lib/models/seg_hrnet.py", line 408, in forward x = self.conv1(x) File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, **kwargs) File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663`

Fiordarancio commented 4 years ago

@Frostinassiky Thank you for your helpful workaround: I encountered problems with ninja too and still I am quite stuck with them. I borrowed your code for setup.py but when I execute it I get the following error:

building 'inplace_abn_cpp_backend' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/TH -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -I/home/ilaria/workspace/hrnenv/include/python3.6m -c src/inplace_abn.cpp -o build/temp.linux-x86_64-3.6/src/inplace_abn.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=inplace_abn_cpp_backend -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
src/inplace_abn.cpp: In function ‘void pybind11_init_inplace_abn_cpp_backend(pybind11::module&)’:
src/inplace_abn.cpp:70:69: error: no matching function for call to ‘pybind11::module::def(const char [9], <unresolved overloaded function type>, const char [36])’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
In file included from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/utils/pybind.h:6:0,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/python.h:12,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/extension.h:6,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:6,
                 from src/inplace_abn.cpp:1:
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note: candidate: template<class Func, class ... Extra> pybind11::module& pybind11::module::def(const char*, Func&&, const Extra& ...)
     module &def(const char *name_, Func &&f, const Extra& ... extra) {
             ^~~
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note:   template argument deduction/substitution failed:
src/inplace_abn.cpp:70:69: note:   couldn't deduce template parameter ‘Func’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Actually, this is the same complain that I get in the traceback when ninja is istalled (pretty the same of this previous issue). I sincerly cannot figure out how to solve the problem, since I haven't found decent solutions in related topics.

My environment:

I tried also to use downgraded versions of torch and CUDA, but it did not work. Any help would be appreciated!

kaizen0890 commented 4 years ago

@Frostinassiky Thank you for your helpful workaround: I encountered problems with ninja too and still I am quite stuck with them. I borrowed your code for setup.py but when I execute it I get the following error:

building 'inplace_abn_cpp_backend' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/TH -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -I/home/ilaria/workspace/hrnenv/include/python3.6m -c src/inplace_abn.cpp -o build/temp.linux-x86_64-3.6/src/inplace_abn.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=inplace_abn_cpp_backend -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
src/inplace_abn.cpp: In function ‘void pybind11_init_inplace_abn_cpp_backend(pybind11::module&)’:
src/inplace_abn.cpp:70:69: error: no matching function for call to ‘pybind11::module::def(const char [9], <unresolved overloaded function type>, const char [36])’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
In file included from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/utils/pybind.h:6:0,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/python.h:12,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/extension.h:6,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:6,
                 from src/inplace_abn.cpp:1:
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note: candidate: template<class Func, class ... Extra> pybind11::module& pybind11::module::def(const char*, Func&&, const Extra& ...)
     module &def(const char *name_, Func &&f, const Extra& ... extra) {
             ^~~
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note:   template argument deduction/substitution failed:
src/inplace_abn.cpp:70:69: note:   couldn't deduce template parameter ‘Func’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Actually, this is the same complain that I get in the traceback when ninja is istalled (pretty the same of this previous issue). I sincerly cannot figure out how to solve the problem, since I haven't found decent solutions in related topics.

My environment:

  • torch 1.4.0
  • CUDA 10.1
  • ninja 1.9.0
  • gcc/g++ 7.4.0

I tried also to use downgraded versions of torch and CUDA, but it did not work. Any help would be appreciated!

I found that this error relates to version matching problem, specifying between Torch and ninja version. Therefore I solved this problem by finding ninja version which matching with Torch version. My environment as show as below: Ubuntu: 16.04 Gcc: 6.5.0 Python: 3.5.2 Torch: 0.4.1 ninja: 1.8.2 Hopefully that it can help you guys!