NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Unable to import Torch (ModuleNotFoundError) when running build_all_conda.sh #182

Open yxzisavail opened 1 month ago

yxzisavail commented 1 month ago
Env: trying to set up w/ conda, tried with docker but failed. Ubuntu 20.04 with RTX. Output: (FPose1) ysz@ps-Z790-UD:~/FoundationPose$ sudo CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.9/site-packages/pybind11/share/cmake/pybind11 bash build_all_conda.sh [sudo] ysz 的密码: -- The C compiler identification is GNU 9.4.0 -- The CXX compiler identification is GNU 9.4.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found version "1.71.0") found components: system program_options -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.8.10", minimum required is "3.6") -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found pybind11: /home/ysz/anaconda3/envs/FPose1/lib/python3.9/site-packages/pybind11/include (found version "2.12.0") -- Configuring done -- Generating done -- Build files have been written to: /home/ysz/FoundationPose/mycpp/build Scanning dependencies of target mycpp [ 66%] Building CXX object CMakeFiles/mycpp.dir/src/Utils.cpp.o [ 66%] Building CXX object CMakeFiles/mycpp.dir/src/app/pybind_api.cpp.o /home/ysz/FoundationPose/mycpp/src/app/pybind_api.cpp: In function ‘vectorMatrix4f cluster_poses(float, float, const vectorMatrix4f&, const vectorMatrix4f&)’: /home/ysz/FoundationPose/mycpp/src/app/pybind_api.cpp:26:38: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector<Eigen::Matrix<float, 4, 4>, Eigen::aligned_allocator<Eigen::Matrix<float, 4, 4> > >::size_type’ {aka ‘long unsigned int’} [-Wformat=] 26
printf("num original candidates = %d\n",poses_in.size()); ~^ ~~~
int std::vector<Eigen::Matrix<float, 4, 4>, Eigen::aligned_allocator<Eigen::Matrix<float, 4, 4> > >::size_type {aka long unsigned int}
/home/ysz/FoundationPose/mycpp/src/app/pybind_api.cpp:66:42: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector<Eigen::Matrix<float, 4, 4>, Eigen::aligned_allocator<Eigen::Matrix<float, 4, 4> > >::size_type’ {aka ‘long unsigned int’} [-Wformat=] 66 printf("num of pose after clustering: %d\n",poses_out.size()); ~^ ~~~~
int std::vector<Eigen::Matrix<float, 4, 4>, Eigen::aligned_allocator<Eigen::Matrix<float, 4, 4> > >::size_type {aka long unsigned int}

[100%] Linking CXX shared module mycpp.cpython-38-x86_64-linux-gnu.so [100%] Built target mycpp Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Obtaining file:///home/ysz/FoundationPose/bundlesdf/mycuda Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/home/ysz/FoundationPose/bundlesdf/mycuda/setup.py", line 13, in from torch.utils.cpp_extension import BuildExtension, CUDAExtension ModuleNotFoundError: No module named 'torch' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

[notice] A new release of pip is available: 24.0 -> 24.1.2 [notice] To update, run: python3 -m pip install --upgrade pip

#######################END OF OUTPUT################################ I am 100% sure that the virtual environment I am in (FPose1) has Pytorch, Torchvision and Torchaudio installed and that their versions match correctly with the CUDA I am currently using (CUDA 11.8). Output of $ conda list (FPose1) ysz@ps-Z790-UD:~/FoundationPose$ conda list

packages in environment at /home/ysz/anaconda3/envs/FPose1:


Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
addict 2.4.0 pypi_0 pypi aiohttp 3.9.5 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi albumentations 1.4.2 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi anyio 4.4.0 pypi_0 pypi
[... additional packages ...]
certifi 2024.7.4 pypi_0 pypi
[... additional packages ...]
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
lit 18.1.8 pypi_0 pypi
[... additional packages ...]
nest-asyncio 1.6.0 pypi_0 pypi
[... additional packages ...]
optional-django 0.1.0 pypi_0 pypi
[... additional packages ...]
platformdirs 4.2.2 pypi_0 pypi
[... additional packages ...]
python-dateutil 2.9.0.post0 pypi_0 pypi python-json-logger 2.0.7 pypi_0 pypi pytorch3d 0.7.3 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi pyzmq 24.0.1 pypi_0 pypi readline 8.2 h5eee18b_0
referencing 0.35.1 pypi_0 pypi
[... additional packages ...]
shapely 2.0.4 pypi_0 pypi simplejson 3.19.2 pypi_0 pypi six 1.16.0 pypi_0 pypi smmap 5.0.1 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi soupsieve 2.5 pypi_0 pypi sqlite 3.45.3 h5eee18b_0
stack-data 0.6.3 pypi_0 pypi
[... additional packages ...]
tomli 2.0.1 pypi_0 pypi torch 2.0.0+cu118 pypi_0 pypi torchaudio 2.0.1+cu118 pypi_0 pypi torchnet 0.0.4 pypi_0 pypi torchvision 0.15.1+cu118 pypi_0 pypi
[... additional packages ...]
widgetsnbextension 4.0.11 pypi_0 pypi
[... additional packages ...]
yacs 0.1.8 pypi_0 pypi yarl 1.9.4 pypi_0 pypi zipp 3.19.2 pypi_0 pypi zlib 1.2.13 h5eee18b_1
#####################END OF OUTPUT##########################

Output of $ nvcc -V: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0 #####################END OF OUTPUT########################

For reference, I also added the following PATH environment variable to my ~/.bashrc: export CUDA_HOME=/usr/local/cuda-11.8 and the following two variables exist in the first place (maybe some other user added them previously): export PATH=/usr/local/cuda-11.8/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH


Ashima1012 commented 1 month ago

Hi, how did you fix this problem?

yxzisavail commented 1 month ago

Well unfortunately I did not as of right now.

wenbowen123 commented 2 weeks ago

Hi, did you try docker?