NVlabs / PoseCNN-PyTorch

PyTorch implementation of the PoseCNN framework
Other
195 stars 49 forks source link

Problem while running PoseCNN in docker container #36

Closed Unnon97 closed 1 year ago

Unnon97 commented 1 year ago

Hello, I was trying to set up PoseCNN in a docker container in a PC with RTX 4090 GPU. Using Ubuntu 20.04 image with nvidia driver 525.85, cuda version 11.6 and torch==1.12.0+cu116, python 3.8.10, pip 20.0.2, I have built Eigen and Sophus libraries inside the container by cloning https://gitlab.com/libeigen/eigen.git(--branch=3.4) and https://github.com/yuxng/Sophus.git.

Everything is built but when I run the ./experiments/scripts/ycb_video_train.sh command with slight modifications, I get the error:

libEGL warning: DRI2: failed to create dri screen libEGL warning: DRI2: failed to create dri screen Unable to initialize EGL Command '['/deps/PoseCNN/tools/../ycb_render/build/test_device', '0']' returned non-zero exit status 1. Traceback (most recent call last): File "./tools/train_net.py", line 141, in cfg.renderer = YCBRenderer(width=cfg.TRAIN.SYN_WIDTH, height=cfg.TRAIN.SYN_HEIGHT, render_marker=False) File "/deps/PoseCNN/tools/../ycb_render/ycb_renderer.py", line 88, in init self.r = CppYCBRenderer.CppYCBRenderer(width, height, get_available_devices()[gpu_id]) IndexError: list index out of range

Unnon97 commented 1 year ago

The specific docker container needs to be cudagl image from docker hub instead of any other cuda based image.

valentinhendrik commented 1 year ago

How did you overcome the problem with THC/THC.h error? Did you edit the concerning files?

Unnon97 commented 1 year ago

Yes I had to modify the ROIAlign.cu file and save it in my Posecnn directory. I used this link to refer to the changes required. https://blog.csdn.net/weixin_44487231/article/details/119759792 ( Its in chinese but code replacements can be understood)

valentinhendrik commented 1 year ago

Yes I had to modify the ROIAlign.cu file and save it in my Posecnn directory. I used this link to refer to the changes required. https://blog.csdn.net/weixin_44487231/article/details/119759792 ( Its in chinese but code replacements can be understood)

I did actually find someone writing in english with the exact same approach, but that helps me to be sure the error I currently get is in no connection to my changes in the file! Thank you anyway