NVIDIA / flownet2-pytorch

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
Other
3.12k stars 739 forks source link

Can't install #123

Open rodrigosilvafe opened 5 years ago

rodrigosilvafe commented 5 years ago

Hi everyone. Thanks for the amazing job.

When I try to install using !bash install.sh it outputs:

############## OUTPUT ###############

running install running bdist_egg running egg_info writing correlation_cuda.egg-info/PKG-INFO writing dependency_links to correlation_cuda.egg-info/dependency_links.txt writing top-level names to correlation_cuda.egg-info/top_level.txt writing manifest file 'correlation_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'correlation_cuda' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/lib/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c correlation_cuda.cc -o build/temp.linux-x86_64-3.6/correlation_cuda.o -std=c++11 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=correlation_cuda -D_GLIBCXX_USE_CXX11_ABI=0 In file included from correlation_cuda.cc:1:0: /usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]

warning \

^~~ correlation_cuda.cc: In function ‘int correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int)’: correlation_cuda.cc:74:25: error: ‘class at::Context’ has no member named ‘getCurrentCUDAStream’ at::globalContext().getCurrentCUDAStream() ^~~~~~~~ correlation_cuda.cc: In function ‘int correlation_backward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int)’: correlation_cuda.cc:155:69: error: ‘class at::Context’ has no member named ‘getCurrentCUDAStream’ at::globalContext().getCurrentCUDAStream() ^~~~~~~~ error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 running install running bdist_egg running egg_info writing resample2d_cuda.egg-info/PKG-INFO writing dependency_links to resample2d_cuda.egg-info/dependency_links.txt writing top-level names to resample2d_cuda.egg-info/top_level.txt writing manifest file 'resample2d_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'resample2d_cuda' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/lib/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c resample2d_cuda.cc -o build/temp.linux-x86_64-3.6/resample2d_cuda.o -std=c++11 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=resample2d_cuda -D_GLIBCXX_USE_CXX11_ABI=0 In file included from resample2d_cuda.cc:2:0: /usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]

warning \

^~~ /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.6/dist-packages/torch/lib/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c resample2d_kernel.cu -o build/temp.linux-x86_64-3.6/resample2d_kernel.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=resample2d_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 resample2d_kernel.cu(211): error: class "at::Context" has no member "getCurrentCUDAStream"

resample2d_kernel.cu(256): error: class "at::Context" has no member "getCurrentCUDAStream"

resample2d_kernel.cu(283): error: class "at::Context" has no member "getCurrentCUDAStream"

3 errors detected in the compilation of "/tmp/tmpxft_00000197_00000000-10_resample2d_kernel.compute_70.cpp1.ii". error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1 running install running bdist_egg running egg_info writing channelnorm_cuda.egg-info/PKG-INFO writing dependency_links to channelnorm_cuda.egg-info/dependency_links.txt writing top-level names to channelnorm_cuda.egg-info/top_level.txt writing manifest file 'channelnorm_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'channelnorm_cuda' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/lib/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c channelnorm_cuda.cc -o build/temp.linux-x86_64-3.6/channelnorm_cuda.o -std=c++11 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=channelnorm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 In file included from channelnorm_cuda.cc:1:0: /usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]

warning \

^~~ /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.6/dist-packages/torch/lib/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c channelnorm_kernel.cu -o build/temp.linux-x86_64-3.6/channelnorm_kernel.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=channelnorm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 channelnorm_kernel.cu(110): error: class "at::Context" has no member "getCurrentCUDAStream"

channelnorm_kernel.cu(110): error: class "at::Context" has no member "getCurrentCUDAStream"

channelnorm_kernel.cu(110): error: class "at::Context" has no member "getCurrentCUDAStream"

channelnorm_kernel.cu(150): error: class "at::Context" has no member "getCurrentCUDAStream"

channelnorm_kernel.cu(150): error: class "at::Context" has no member "getCurrentCUDAStream"

channelnorm_kernel.cu(150): error: class "at::Context" has no member "getCurrentCUDAStream"

6 errors detected in the compilation of "/tmp/tmpxft_000001b5_00000000-9_channelnorm_kernel.compute_70.cpp1.ii". error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

############## OUTPUT ###############

I am working on Colab, it uses:

torch version: 1.0.0 cudnn version: 7401 cuda version: 10.0.130 gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

Any help would be very appreciated Thanks in advance

MoonBlvd commented 5 years ago

I have the same issue and I guess it was because of CUDA10.0. Please let me know if some one solves it!

xmlyqing00 commented 5 years ago

I also encountered this problem. I use CUDA10 too.

jcliu0428 commented 5 years ago

Hi,I guess it's pytorch or cuda versions problem. My environment version is following,it works for me: Ubuntu16.04, CUDA 9.0 Pytorch 0.4.1, python 3.6 I guess you can try to convert your pytorch version or setup a local cuda 9.0 environment.Do not forget to modify the PATH in your .bashrc file. Good luck.

rodrigosilvafe commented 5 years ago

Thanks for the answer

It worked for me using torch 0.4.0, cuda 9.0, gcc 6.5 and g++ 6.5 (I am working on colab, it uses ubuntu 18 and python 3.6)

Hope it helps someone else

Thanks again

fperezgamonal commented 5 years ago

Hello, I'm facing this same issue. Actually, I am trying to run @rodrigosilvafe setup also on Google Colab by installing CUDA 9.0 instead of the system's default (Cuda 10.0)

To do so, I have tried to purge 10.0 as shown here in step 1. Maybe this step is not necessary (I'll try without it). Then I tried to install CUDA 9.0 by running: !wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb !dpkg --install cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb !apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub !apt-get update !apt-get install cuda

And then install PyTorch 0.4.1 for CUDA 9.0: !pip install https://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl Then, after restarting the runtime as asked by the system, the printed versions are as expected: screenshot from 2019-03-08 20-44-28

The problem is that when I change directory to where the flownet2 repository is (with os.chdir()) and run !bash install.sh, I get the following error (when installing correlation_cuda): /usr/local/lib/python3.6/dist-packages/torch/lib/include/THC/THCAtomics.cuh(100): error: cannot overload functions distinguished by return type alone /usr/local/lib/python3.6/dist-packages/torch/lib/include/THC/THCAtomics.cuh(123): error: return value type does not match the function type.

The other two modules (resample2d and channel_norm_cuda) get installed without problems nor warnings.

Any help would be appreciated! Thanks in advance :+1:

PS: maybe one hint is that running !cat /usr/local/cuda/version.txt or seeing the output of nvidia-smi still indicates that the CUDA version is 10.1.105 (as does nvcc --version) screenshot from 2019-03-08 22-03-05 .

rodrigosilvafe commented 5 years ago

Hello, I'm facing this same issue. Actually, I am trying to run @rodrigosilvafe setup also on Google Colab by installing CUDA 9.0 instead of the system's default (Cuda 10.0)

To do so, I have tried to purge 10.0 as shown here in step 1. Maybe this step is not necessary (I'll try without it). Then I tried to install CUDA 9.0 by running: !wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb !dpkg --install cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb !apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub !apt-get update !apt-get install cuda

And then install PyTorch 0.4.1 for CUDA 9.0: !pip install https://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl Then, after restarting the runtime as asked by the system, the printed versions are as expected: screenshot from 2019-03-08 20-44-28

The problem is that when I change directory to where the flownet2 repository is (with os.chdir()) and run !bash install.sh, I get the following error (when installing correlation_cuda): /usr/local/lib/python3.6/dist-packages/torch/lib/include/THC/THCAtomics.cuh(100): error: cannot overload functions distinguished by return type alone /usr/local/lib/python3.6/dist-packages/torch/lib/include/THC/THCAtomics.cuh(123): error: return value type does not match the function type.

The other two modules (resample2d and channel_norm_cuda) get installed without problems nor warnings.

Any help would be appreciated! Thanks in advance

PS: maybe one hint is that running !cat /usr/local/cuda/version.txt or seeing the output of nvidia-smi still indicates that the CUDA version is 10.1.105 (as does nvcc --version) screenshot from 2019-03-08 22-03-05 . I think the nvcc version is the actual cuda that the system is using, the torch.cuda versions are the versions that pytorch needs to work properly, but I'm not sure of this

In any case, for me worked to delete the cuda using these two commands:

!rm -rf /usr/local/cuda-10.0 !rm -rf /usr/local/cuda

Also, I made almost the same as you did to install, except that I added 9.0 to the install line:

!dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64-deb !apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub !apt-get update !apt-get install cuda 9.0

I believe that if you just use cuda, it will install the last version (10 in this case)

the nvidia-smi indicates the version of the driver the gpu is using, but I believe is not the cuda version that the system is using

hope this helps good luck

fperezgamonal commented 5 years ago

Thanks @rodrigosilvafe , I managed to get it to work by following your steps. Nevertheless, I had to add an extra one to use gcc5 instead of the colab's default (7.3) as it is incompatible with CUDA 9.0. As stated here, I did:

!apt update !apt install g++-5 to install gcc-5 and g++-5 (for some reason !apt install gcc-5 g++-5 only updated the former)

And then set it as the default by doing:

!update-alternatives --remove-all gcc !update-alternatives --remove-all g++

!update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 20 !update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 20

!update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30 !update-alternatives --set cc /usr/bin/gcc

!update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30 !update-alternatives --set c++ /usr/bin/g++

By the way, although the error complained about GCC 6.0, theoretically, it only does not work for version 7 and above but I have not tried.

Regards,

Ferran.

ns3284 commented 5 years ago

Anyone able to get it to work with CUDA 10? GPU doesn't support anything less.

rodrigosilvafe commented 5 years ago

Anyone able to get it to work with CUDA 10? GPU doesn't support anything less.

I think this code it is incompatible with cuda 10 with GPU you mean your GPU? if you use CoLab you can use the Tesla K80 and change the cuda version

ns3284 commented 5 years ago

Correct. Trying to utilize my own GPU (2080Ti).

ns3284 commented 5 years ago

From looking at the CUDA Guide, hopefully it is an easy fix.

Edit: Assuming I am overly optimistic with this. Going to try manually compiling first, then work on getting it to work in this python script.

https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html

Original: (setup.py)

nvcc_args = [
    '-gencode', 'arch=compute_50,code=sm_50',
    '-gencode', 'arch=compute_52,code=sm_52',
    '-gencode', 'arch=compute_60,code=sm_60',
    '-gencode', 'arch=compute_61,code=sm_61',
    '-gencode', 'arch=compute_70,code=sm_70',
    '-gencode', 'arch=compute_70,code=compute_70'
]

Proposed modification

nvcc_args = [
    '-gencode', 'arch=compute_50,code=sm_50',
    '-gencode', 'arch=compute_52,code=sm_52',
    '-gencode', 'arch=compute_60,code=sm_60',
    '-gencode', 'arch=compute_61,code=sm_61',
    '-gencode', 'arch=compute_70,code=sm_70',
    '-gencode', 'arch=compute_75,code=sm_75',
    '-gencode', 'arch=compute_75,code=compute_75'
]

Unable to test now, but will test this evening.

ns3284 commented 5 years ago

Had to make a few changes to get it to work with CUDA 10. Some might be superfluous.

In correlation_cuda.cc I added

#include <ATen/Context.h>
#include <ATen/cuda/CUDAContext.h>

and replaced at::globalContext().getCurrentCUDAStream() with at::cuda::getCurrentCUDAStream()

Did the same with channelnorm_kernel.cu

Will do the same with resample2d if needed. Didn't get there yet. Compiles with CUDA 10 now. (Also changed the setup.py files to match my above post)

MoonBlvd commented 5 years ago

Hi Nicholas,

Thank you for doing this! Does your fix work now?

On Wed, Mar 27, 2019 at 7:45 PM Nicholas Sparks notifications@github.com wrote:

Had to make a few changes to get it to work with CUDA 10. Some might be superfluous.

In correlation_cuda.cc I added

include <ATen/Context.h> #include <ATen/cuda/CUDAContext.h>

and replaced at::globalContext().getCurrentCUDAStream() with at::cuda::getCurrentCUDAStream()

Did the same with channelnorm_kernel.cu

Will do the same with resample2d if needed. Didn't get there yet. Compiles with CUDA 10 now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/flownet2-pytorch/issues/123#issuecomment-477388675, or mute the thread https://github.com/notifications/unsubscribe-auth/APTAo58xr7EXdG0lopdbwU8ebUV8_TNFks5vbAKCgaJpZM4alh-V .

ns3284 commented 5 years ago

Turns out similar instructions are in a pull request already. Fixed it for Linux, still unable to install on Windows.

awaelchli commented 5 years ago

Hi @ns3284

I can compile with your modifications (also cuda10) but am still getting an error when importing the module, e.g.

import correlation_cuda

gives

ImportError: /home/adrian/.local/lib/python3.7/site-packages/correlation_cuda-0.0.0-py3.7-linux-x86_64.egg/correlation_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

I think it is related to #136

hansongfang commented 5 years ago

Fix it for Ubuntu 18.04, cuda 9.0, torch 1.01

Master branch code support torch 1.0.1. Turns out gcc6.x version is not compatible with cuda9.0. I compile with gcc5.x. Then it works.

Hope this help you.

tjusxh commented 5 years ago

My environment version is following: Windows10, CUDA 9.0 Pytorch 0.4.1, python 3.5 When I run bash install.sh. I have the similar issue. How to resolve the question, Thanks.

$ bash install.sh running install running bdist_egg running egg_info writing top-level names to correlation_cuda.egg-info\top_level.txt writing dependency_links to correlation_cuda.egg-info\dependency_links.txt writing correlation_cuda.egg-info\PKG-INFO reading manifest file 'correlation_cuda.egg-info\SOURCES.txt' writing manifest file 'correlation_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext building 'correlation_cuda' extension

..... correlation_cuda.cc:4:35: fatal error: ATen/cuda/CUDAContext.h: No such file or directory compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

huangbiubiu commented 5 years ago

I have a similar problem.

In my case, the package works well in my environment. But it raises this error when I update PyTorch from 1.0.1 post 2 to 1.0.1 with conda. It also updates cudnn.

I can install packages, but there is an undefined symbol error when import packages.

lelelexxx commented 5 years ago

@tjusxh I have a similar problem, have you solve it? Thanks for your sharing

mohamedabdelhakem1 commented 4 years ago

My environment version is following: Windows10, CUDA 9.0 Pytorch 0.4.1, python 3.5 When I run bash install.sh. I have the similar issue. How to resolve the question, Thanks.

$ bash install.sh running install running bdist_egg running egg_info writing top-level names to correlation_cuda.egg-info\top_level.txt writing dependency_links to correlation_cuda.egg-info\dependency_links.txt writing correlation_cuda.egg-info\PKG-INFO reading manifest file 'correlation_cuda.egg-info\SOURCES.txt' writing manifest file 'correlation_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext building 'correlation_cuda' extension

..... correlation_cuda.cc:4:35: fatal error: ATen/cuda/CUDAContext.h: No such file or directory compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

have you solved that error ?

Gauravv97 commented 4 years ago

I have managed to make it work with Colab's K80. Here is the link if anyone is intrested.

Warniz commented 3 years ago

I have managed to make it work with Colab's K80. Here is the link if anyone is intrested.

Can confirm that this works on Colab.

Changed pytorch version to 1.4 to be compatable with few-shot-2vid - Working no problem - Thanks!!

mayujie commented 3 years ago

works for me! :)

lyq998 commented 2 years ago
  • nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
  • gcc-7 g++-7
  • python3.6
  • pip3
  • pip3 install torch==1.0.0 -f https://download.pytorch.org/whl/cu90/torch_stable.html

works for me! :)

It works for me! BTW, I have tried cuda9.0, cuda8.0, and pytorch 0.4.1, but they do not work... Only pytorch 1.0.0 + cuda10.0 can work. It's kind of weird...