Open matteomastrogiuseppe opened 2 months ago
Hi @matteomastrogiuseppe,
maybe I can help you as I had a similar problem when trying to run inference. I don't think the difference between CUDA 12.3 on your system and 11.5 in the conda env should be a problem (I get the same warning).
After some trial and error, I can run inference on different systems with different package versions in my environment (see below). You can also try using my fork https://github.com/savidini/SAM-6D. In this case, please read the Getting Started as there are some changes. I have tested this on several cloud servers and my local machine and can run inference on the example and custom objects without errors.
Please let me know if this works for you, I might open a pull request in that case. If you still get errors, you can post a full error log.
Kind regards
Hi @savidini,
Many thanks for reaching out. Your fork works! I also managed to run sam6d on a custom object smoothly. A PR would be great I think.
By the way, when I try to run it on depth+rgb coming from an Intel Realsense, the algorithm doesn't seem to work properly. The segmentation works fine, but the pose estimation is completely messed up. On the other hand, if I run the algorithm on the same object but on a Blender-generated rgb+depth I get very good results.
Have you tried to run the algorithm on real data? My guess is that the algorithm does not work when there is no perfect correspondece between depth and rgb data (like in the Realsense)
@matteomastrogiuseppe for future reproducibility, it would be great if you could report the mamba list
of the environment in which everything is working for you. To avoid clutter the issue, you can use the <details>
html tag, see:
<details>
<summary> click to see everything </summary>
```
this is hidden
```
</details>
that is rendered as:
Thanks for your feedback @matteomastrogiuseppe! I am glad it worked for you too and I will create a PR later on.
Regarding the inference on real data: I just tested it, and for me it was the same at first. ISM worked, but PEM did not. Turns out it had something to do with my depth map. As you already mentioned, the correspondence between depth and RGB data must be exact. For this the frames need to be aligned.
Here is a somewhat detailed explanation of how it worked for me:
Unfortunately, I don't have many objects available for which I also have CAD files, and vice versa. So I used a Rubik's Cube and a model I found online. I had to convert and scale it (I did this with Meshlab).
If you have a Rubik's Cube at hand, here is my .ply file: rubiks_cube.ply
I am using a slightly modified script, see below, from librealsense to capture the aligned RGB and depth image, as well as my camera.json
file with the intrinsic data from the Realsense D435 I am using. (Disclaimer: I am not a 100% sure if the intrinsic data is correct, as I am still quite new to the whole topic)
The script saves the color image and the depth map for the last frame when it is closed (Q/ESC).
My sample output data from the script:
Same as for the example data, run from directory containing all the files:
export CAD_PATH=$PWD/rubiks_cube_scaled.ply
export RGB_PATH=$PWD/rgb_test.png
export DEPTH_PATH=$PWD/depth_test.png
export CAMERA_PATH=$PWD/camera.json
export OUTPUT_DIR=$PWD/outputs
sh demo.sh
Note that for me this often results in OOM errors (I am using one RTX4090/24GB VRAM). Possible solution, if only a small amount of memory is missing:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Otherwise reduce points_per_side
and points_per_batch
in automatic_mask_generator.py
(see #24).
I am currently using the following:
points_per_side: Optional[int] = 16,
points_per_batch: int = 32,
(The required memory seems to be somewhat dependent on the size of the object in the image, not sure, very strange behavior.)
vis_ism.png: vis_pem.png:
Hope this helps. Regards and have a nice weekend!
@savidini Infinite thanks for solving also this issue, aligning depth and RGB was indeed the key! Kind regards!
I'll keep this issue opened until the PR is accepted.
@savidini I tried your environment.yaml and still got same error.
Could you please have a look?
I think prepare.sh is wrong?
dm-4.66.2 trimesh-4.0.8 triton-2.2.0 typing-extensions-4.11.0 tzdata-2024.1 ultralytics-8.0.135 urllib3-2.2.1 wcwidth-0.2.13 werkzeug-3.0.2 xformers-0.0.25 yacs-0.1.8 yapf-0.40.2 yarl-1.9.4 zipp-3.18.1
done
#
# To activate this environment, use
#
# $ conda activate sam6d
#
# To deactivate an active environment, use
#
# $ conda deactivate
CondaError: Run 'conda init' before 'conda activate'
Traceback (most recent call last):
File "/data/SAM-6D/SAM-6D/Pose_Estimation_Model/model/pointnet2/setup.py", line 7, in <module>
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
ModuleNotFoundError: No module named 'torch'
Traceback (most recent call last):
File "/data/SAM-6D/SAM-6D/Instance_Segmentation_Model/download_sam.py", line 10, in <module>
import hydra
ModuleNotFoundError: No module named 'hydra'
Traceback (most recent call last):
File "/data/SAM-6D/SAM-6D/Instance_Segmentation_Model/download_fastsam.py", line 10, in <module>
import hydra
ModuleNotFoundError: No module named 'hydra'
Traceback (most recent call last):
File "/data/SAM-6D/SAM-6D/Instance_Segmentation_Model/download_dinov2.py", line 10, in <module>
import hydra
ModuleNotFoundError: No module named 'hydra'
sh: 1: gdown: not found
(base) mona@ada:/data/SAM-6D/SAM-6D$ sh prepare.sh
also @savidini
File "/home/mona/anaconda3/envs/sam6d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 523, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/mona/anaconda3/envs/sam6d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 414, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
Hi @monajalal,
regarding the first issue, it looks like you are running the prepare.sh
from the base conda environment.
Please note that my fork (#41) moves the creation and activation of the environment from the prepare.sh to a manual step (because this is more robust on remote systems). Please read the readme of my fork and follow the steps listed there, this should hopefully fix the problem. :smiley:
Regarding the CUDA mismatch, could you please copy the output of the nvidia-smi
command on your machine? If you are using older NVIDIA drivers, it is possible that my fork will not work.
Hi @savidini , thanks a lot for the improvements from your fork, it seems it's been really useful.
I tried using your fork, however, I quickly ran into issues just for creating the conda environment using the updated environment.yml
file. When using the original file from the original repo, I was able to create the environment without issues, but then I ran into problems for installing PointNet, and thus I wanted to use your fork.
The issue I'm getting with your updated version is that creating the conda env takes ages, and then it crashes because it can't solve certain dependencies:
Package pip conflicts for:
pip
python=3.9.6 -> pipThe following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.31=0
- feature:/linux-64::__linux==5.15.0=0
- feature:|@/linux-64::__glibc==2.31=0
- cuda=12.0 -> __linux
- cuda=12.0 -> __win
- python=3.9.6 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.31
Here are some details of my server's setup:
conda information:
conda version : 23.3.1
conda-build version : 3.23.3
python version : 3.9.19.final.0
virtual packages : __archspec=1=x86_64
__cuda=12.2=0
__glibc=2.31=0
__linux=5.15.0=0
__unix=0=0
nvidia-smi
output:
NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2
nvcc Driver output:
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_
I'm not sure what's the issue, but it might be because I have __glibc=2.31=0
(I'm not sure what this does). I commented out the cuda=12.0
and libxcrypt
from the .yaml file, and with this, I solved the environment creation issue. However, when I run the prepare.sh
and try to compile PointNet, it fails because of this:
RuntimeError:
The detected CUDA version (11.1) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
Hi @david1309
The issue I'm getting with your updated version is that creating the conda env takes ages, and then it crashes because it can't solve certain dependencies
Take forever and fail to solve the environment: This sounds a lot like the old conda solver that your conda version still uses by default. :sweat_smile:
Can you try creating the environment with the libamamba solver instead? Either by following these instructions, or by installing a newer version of conda.
It should take about 5 minutes to solve and install the environment. :wink:
Let me know if it works.
Hi @savidini , thanks for your prompt response.
Indeed, part of the issue might come from the old conda, so as you suggested I updated conda and also set the default solver to be libmamba
. I also manually updated from NVIDIA’s CUDA website the CUDA to 12.1 (since I also had the old 11.1 one)
Having updated conda
and CUDA
, I re-tried creating your fork's environment, but I got the following error when trying to install cuda. At first I tried with cuda=12.0
but it produced the error, so I changed it to cuda=12.1
since that's the version of I manually installed, but it still produced the same error:
I then created the environment but commented out the cuda
package, which successfully created the environment. However, when trying to compile PointNet, there's an error as the compiler can't manage to find the header files:
In particular I see these two error: .../python3.9/site-packages/torch/utils/cpp_extension.py:425: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
and 10: fatal error: ball_query.h: No such file or directory
Any guidance ?
Hi @david1309,
I don't think it should be necessary to install CUDA manually. Your NVIDIA driver supports CUDA up to version 12.2, which is fine. CUDA and its components can be installed directly into conda, which in my experience is also the most robust when it comes to solving the environment.
Your error in creating the CUDA environment seems very strange. It seems that conda can only find a Windows version. Are you running WSL Ubuntu by any chance? I have never seen such an error in conda before. :raised_eyebrow:
Can you please provide the output of the following commands (from anywhere/base env):
cat /etc/os-release
conda info
echo $CUDA_PATH
echo $CUDA_HOME
echo $PYTHONPATH
And from your activated sam6d environment (e.g. sam6d_savidini_fork_2):
conda list
which nvcc
which python
which pip
Maybe I can try to find a solution during this weekend. :v:
Hi @savidini , thanks for your kind help.
The environment called sam6d_savidini_fork
was created using your fork's environment.yaml
but commenting out the cuda=12.0
package, given that if I included that package the environment would fail to install.
Hi @david1309, long story short: I think in your case the local CUDA installation leads to conflicts, and my original environment.yaml
was a bit buggy because PyTorch also tries to install it's own version of CUDA, see this nice post. This causes errors when building PointNet2.
You can try to create a new environment from the following file (this will not install CUDA, as you already have 12.1 installed):
(You may want to run conda clean --all -y
before creating the environment)
Let me know how it works. :v:
I managed to fix the PointNet compilation error in which the compiler didn't find the .h header files. The reason is (probably) because my machine has an older version of gcc
version 9.4.0
and it's not able to find the header files located under _ext_src
.
The fix consists in modifying the file Pose_Estimation_Model/model/pointnet2/**setup.py**
. Instead of referencing the header files with _ext_src_root = "_ext_src"
, I replaced this line by:
from pathlib import Path
_ext_src_root = str(Path(__file__).parent / "_ext_src")
Using this absolute path instead of the relative one seems to avoid the issue.
Hey, Thanks a lot for sharing the source code.
I have problems when running the
prepare.sh
file, mostly related to the CUDA version. Here is the main error:and after that I get a bunch of ninja-build error (missing header files and stuff). The problem seems to be related only to PointNet2, the rest of the build goes smooth. When running the demo.sh I get errors because PointNet is missing.
I'm running on Ubuntu 22.04,
nvcc --version
returns:while,
nvidia-smi
returns:Do you have any idea what could be wrong? I get the same problem on multiple machines. Let me know if you need the full error log. Thanks for your help!