Open Victorlouisdg opened 9 months ago
Creating a Zed2i
camera in a regular Python script / parent process still works fine. So this has something to do with the fact that a subprocess started from a Python script behaves differently than its parent process.
May be related to #116
The probleem seem to be GPU availablity in the Publisher process. Trying to close the camera results in:
CUDA error at Camera.cpp:163 code=304(cudaErrorOperatingSystem) "void sl::Camera::close()"
I also get:
IndexError: Could not open Zed2i camera, error = NO GPU DETECTED
PyTorch also does not find any GPUs in the created process torch.cuda.is_available()
returns False
. And prints this warning:
/home/victor/anaconda3/envs/cloth-competition/lib/python3.10/site-packages/torch/cuda/__init__.py:138:
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1699535260532/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Adding multiprocessing.set_start_method("spawn")
fixes the CUDA issue and allows opening ZED cameras. However, the ZED SDK always starts optimizing the neural depth mode with this start method. I don't really understand why, maybe because it doesn't find /usr/local/zed/resources
due to a missing environment variable?
Nevermind, I was confused about the neural_depth optimization. It started because I changed my CUDA version to 12.3, and is not related to multiprocessing.
This fixes the issue, but requires you to do this in every script with a MultiprocessPublisher
:
multiprocessing.set_start_method("spawn")
A more elegant solution is inheriting from:
multiprocessing.context.SpawnProcess
But that does not work and I don't understand why.
Another downside to "spawn"
vs the default "fork"
method, is that it can be much slower, as it copies the entire memory of the parent process (can be several seconds vs several ms). (However in this case this copy is probably what prevents the CUDA errors.)
Describe the bug The publisher process hangs when initializing a
Zed2i
camera.To Reproduce Run one of the multiprocessing scripts.
Expected behavior Yesterday multiprocessing still worked on my setup, I even recorded some short videos with the
MultiprocessVideoRecorder
.Environment: Gorilla desktop. Tested with two Zed2i camera. Occasionally got the error code
NO GPU DETECTED
. Rebooted desktop multiple times.