DecaYale / RNNPose

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization, CVPR 2022
Apache License 2.0
160 stars 17 forks source link

RuntimeError: NCCL error #29

Open AramNasser opened 6 months ago

AramNasser commented 6 months ago

When running the eval.py script with "--use_dist True", I am facing this error: RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

I am using this Docker image: "nvcr.io/nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04" since the one mentioned in the original Dockerfile is no longer available on the Docker hub.

Any suggestion about what could the problem be? Thank you in advance

Nishanth21D commented 3 weeks ago

hey @AramNasser, have you resolved this issue? I am getting the same error.