Closed hellodrx closed 3 years ago
It might be something related to cuda version and anaconda cuda toolkit. But I am not sure. The whole compile process went very smoothly without even an error.
Hi, did you check your cudatoolkit
version?
Hi, did you check your
cudatoolkit
version?
The cudatoolkit version in anaconda env is 10.0.130.
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# conda list Name Version Build Channel _libgcc_mutex 0.1 main defaults blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free certifi 2016.2.28 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free cudatoolkit 10.0.130 0 defaults cycler 0.10.0 pypi_0 pypi cython 0.29.21 pypi_0 pypi decorator 4.4.2 pypi_0 pypi freetype 2.5.5 2 https:
but your CUDA version is NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2
So this might be the issue. I followed the instruction conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
and Anaconda selected the cudatoolkit version -.-.
So this might be the issue. I followed the instruction
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
and Anaconda selected the cudatoolkit version -.-.
That's because my CUDA version is 10.0 :)
So this might be the issue. I followed the instruction
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
and Anaconda selected the cudatoolkit version -.-.That's because my CUDA version is 10.0 :)
I deleted the whole env and reinstall pytorch, torchvision, cudatoolkit and all the pip packages. The error still exists. Now my cudatoolkit version is cudatoolkit 10.2.89
It is very weird that I did not see any error in the compile process.
`(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# python setup.py build_ext --inplace running build_ext building 'parsing._C' extension creating /root/home/deng/projects/wire_frame/hawp/build creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6 creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing creating /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc Emitting ninja build file /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/1] c++ -MMD -MF /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o.d -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/home/deng/projects/wire_frame/hawp/parsing/csrc -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/TH -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/THC -I/root/anaconda3/envs/hawp/include/python3.6m -c -c /root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.cpp -o /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/ATen/Parallel.h:149:0, from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3, from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5, from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3, from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12, from /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/extension.h:4, from /root/home/deng/projects/wire_frame/hawp/parsing/csrc/cuda/vision.h:2, from /root/home/deng/projects/wire_frame/hawp/parsing/csrc/linesegment.h:2, from /root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.cpp:1: /root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/ATen/ParallelOpenMP.h:84:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/parsing g++ -pthread -shared -L/root/anaconda3/envs/hawp/lib -Wl,-rpath=/root/anaconda3/envs/hawp/lib,--no-as-needed /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o -L/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/lib -L/root/anaconda3/envs/hawp/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lpython3.6m -o build/lib.linux-x86_64-3.6/parsing/_C.cpython-36m-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.6/parsing/_C.cpython-36m-x86_64-linux-gnu.so -> parsing
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# CUDA_VISIBLE_DEVICES=1 python scripts/train.py --config-file config-files/hawp.yaml
Traceback (most recent call last):
File "scripts/train.py", line 8, in
Could you directly import the compiled _C in the corresponding directory like this
import torch
import _C
print(dir(_C))
Sorry for the late reply.
Unfortunately, I cannot import _C module
import torch
import _C
Traceback (most recent call last):
File "
Did you try to write a small CUDA function in the current environment? This issue is usually caused by the incompatible environment.
What do you mean by a small CUDA function, like a minimal CUDA example using Numba or Pytorch in the current env?
I can run Pytorch project in this new env.
[21-01-31 03:07:45 PM] epoch 0, iteration 1, final_loss : 1723.3910
[21-01-31 03:07:46 PM] epoch 0, iteration 2, final_loss : 1732.1219
[21-01-31 03:07:47 PM] epoch 0, iteration 3, final_loss : 1606.6276
[21-01-31 03:07:48 PM] epoch 0, iteration 4, final_loss : 1590.3911
[21-01-31 03:07:49 PM] epoch 0, iteration 5, final_loss : 1508.3062
[21-01-31 03:07:51 PM] epoch 0, iteration 6, final_loss : 1462.1621
[21-01-31 03:07:52 PM] epoch 0, iteration 7, final_loss : 1401.9430
^CTraceback (most recent call last):
File "ablation_study.py", line 514, in
How about the result of nvcc --version
?
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# nvcc --version Command 'nvcc' not found, but can be installed with: apt install nvidia-cuda-toolkit
But the system can go with nvidia-smi
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# nvidia-smi
Sun Jan 31 15:15:24 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 66C P0 68W / 70W | 14624MiB / 15109MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:AF:00.0 Off | 0 |
| N/A 27C P8 9W / 70W | 11MiB / 15109MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 37768 C python 14613MiB | +-----------------------------------------------------------------------------+
maybe you need to install a full CUDA development kit on your device.
[1/1] c++ -MMD -MF /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o.d -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/home/deng/projects/wire_frame/hawp/parsing/csrc -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/TH -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/THC -I/root/anaconda3/envs/hawp/include/python3.6m -c -c /root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.cpp -o /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
I carefully read your compile log and found that the nvcc
is not called
It is really too late, sorry for taking up so much time. I will find another machine to check up the env.
Yes, I will contact the administrator to see if I can install the driver.
Thanks again for your reply, grateful for the support!
[1/1] c++ -MMD -MF /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o.d -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/home/deng/projects/wire_frame/hawp/parsing/csrc -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/TH -I/root/anaconda3/envs/hawp/lib/python3.6/site-packages/torch/include/THC -I/root/anaconda3/envs/hawp/include/python3.6m -c -c /root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.cpp -o /root/home/deng/projects/wire_frame/hawp/build/temp.linux-x86_64-3.6/root/home/deng/projects/wire_frame/hawp/parsing/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
I carefully read your compile log and found that the
nvcc
is not called
It seems like the issue is located, but I have to contact the administrator of the machine. Thank you for your help.
okay, good luck to you. Please let me know the result after installing the full CUDA development kit.
okay, good luck to you. Please let me know the result after installing the full CUDA development kit.
Sure, sweet dreams!
okay, good luck to you. Please let me know the result after installing the full CUDA development kit.
Problem solved! It is caused by an incomplete installation of the Cuda toolkit. I checked /usr/local/ in the server, and found no
cuda folder. That's why nvcc cannot be located.
I found another machine that installed the cuda from the official .sh file, and the program can work now.
Thank you very much for your kind help.
Hi, Nan Xue,
Thanks for your excellent work.
I really like your work. After following the instructions to prepare the env and running the script, I got the issue below
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# CUDA_VISIBLE_DEVICES=1 python scripts/train.py --config-file config-files/hawp.yaml Traceback (most recent call last): File "scripts/train.py", line 8, in <module> from parsing.detector import WireframeDetector File "/root/home/deng/projects/wire_frame/hawp/parsing/detector.py", line 4, in <module> from parsing.encoder.hafm import HAFMencoder File "/root/home/deng/projects/wire_frame/hawp/parsing/encoder/__init__.py", line 1, in <module> from .hafm import HAFMencoder File "/root/home/deng/projects/wire_frame/hawp/parsing/encoder/hafm.py", line 2, in <module> from parsing import _C ImportError: /root/home/deng/projects/wire_frame/hawp/parsing/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Z13lsencode_cudaRKN2at6TensorEiiiii
My system is
(hawp) root@ubuntu1:~/home/deng/projects/wire_frame/hawp# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic
My cuda driver and toolkit version is
NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2
Do you have any clue that can help me out? Really appreciate your assistance.