airo-ugent / airo-mono

Python packages for robotic manipulation @ IDLab AI & Robotics Lab - UGent - imec
https://airo.ugent.be
MIT License
16 stars 1 forks source link

`MultiprocessRGB(D)Publisher` cannot instantiate `Zed2i` camera #125

Open Victorlouisdg opened 9 months ago

Victorlouisdg commented 9 months ago

Describe the bug The publisher process hangs when initializing a Zed2i camera.

To Reproduce Run one of the multiprocessing scripts.

Expected behavior Yesterday multiprocessing still worked on my setup, I even recorded some short videos with the MultiprocessVideoRecorder.

Environment: Gorilla desktop. Tested with two Zed2i camera. Occasionally got the error code NO GPU DETECTED. Rebooted desktop multiple times.

Victorlouisdg commented 9 months ago

Creating a Zed2i camera in a regular Python script / parent process still works fine. So this has something to do with the fact that a subprocess started from a Python script behaves differently than its parent process.

m-decoster commented 9 months ago

May be related to #116

Victorlouisdg commented 9 months ago

The probleem seem to be GPU availablity in the Publisher process. Trying to close the camera results in:

CUDA error at Camera.cpp:163 code=304(cudaErrorOperatingSystem) "void sl::Camera::close()" 

I also get:

IndexError: Could not open Zed2i camera, error = NO GPU DETECTED

PyTorch also does not find any GPUs in the created process torch.cuda.is_available() returns False. And prints this warning:

/home/victor/anaconda3/envs/cloth-competition/lib/python3.10/site-packages/torch/cuda/__init__.py:138: 
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1699535260532/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Victorlouisdg commented 9 months ago

Adding multiprocessing.set_start_method("spawn") fixes the CUDA issue and allows opening ZED cameras. However, the ZED SDK always starts optimizing the neural depth mode with this start method. I don't really understand why, maybe because it doesn't find /usr/local/zed/resources due to a missing environment variable?

Victorlouisdg commented 9 months ago

Nevermind, I was confused about the neural_depth optimization. It started because I changed my CUDA version to 12.3, and is not related to multiprocessing.

Victorlouisdg commented 9 months ago

This fixes the issue, but requires you to do this in every script with a MultiprocessPublisher:

multiprocessing.set_start_method("spawn") 

A more elegant solution is inheriting from:

multiprocessing.context.SpawnProcess

But that does not work and I don't understand why.

Victorlouisdg commented 9 months ago

Another downside to "spawn" vs the default "fork" method, is that it can be much slower, as it copies the entire memory of the parent process (can be several seconds vs several ms). (However in this case this copy is probably what prevents the CUDA errors.)