Closed liuzuxin closed 1 year ago
Hi, @liuzuxin, thanks for your asking! I guess this problem may have something to do with the Cuda version or something. Could you please provide more information about your machine? (The platform, NVIDIA driver version, cuda version). I didn't encounter this problem with Ubuntu 20.04 on A100 and Nvidia driver=515.105.01 and cuda=11.7.
Thanks for your reply. Sure, mine is Ubuntu 20.04 with Nvidia driver 525.125.06
and cuda 12.0. I tried downgrading the driver to 470.199.02
and cuda to 11.4, and the SubprocVectorEnv works.
The most strange thing is that I have been successfully using the evaluation script with Nvidia 525 drivers in the past week, but it suddenly broke without upgrading any packages. In other words, after I ran the evaluation script with SubprocVectorEnv successfully, I used the same command again, but it didn't work. So I am curious about what would be the root cause of this problem.
Hi zuxin,
Thanks for asking. We have also noticed this issue and are investigating it. In the meantime, a quick walkaround will be saving the model offline, then you can start multiple evaluation scripts with a single environment for evaluation. This will definitely increase the GPU memory requirement but can make the evaluation faster.
Hi, Is a solution available that does not involve downgrading the Nvidia driver and cuda version? I still encounter this problem with driver 525.60.13 and cuda 12.0. Thanks!
I resolve the problem by adding these two lines to venv.py.
if multiprocessing.get_start_method(allow_none=True) != "spawn":
multiprocessing.set_start_method("spawn", force=True)
I resolve the problem by adding these two lines to venv.py.
if multiprocessing.get_start_method(allow_none=True) != "spawn": multiprocessing.set_start_method("spawn", force=True)
I encountered the same issue, and this solution works for me. Thank you very much!!!
I resolve the problem by adding these two lines to venv.py.
if multiprocessing.get_start_method(allow_none=True) != "spawn": multiprocessing.set_start_method("spawn", force=True)
I encountered the same issue, and this solution works for me. Thank you very much!!!
May I ask which line of env.py should I add it to? @lihenglin @JamesSand
Hi, when I try to use the evaluation script on a headless machine (cloud server) with A10G GPU, I occasionally come across the following error:
Sometimes I came across this issue due to insufficient CUDA memory; however, now even with enough memory, I still encounter this problem and have no idea how to solve it. I can use the evaluation script with
DummyVectorEnv
, but it seems to be too slow. So I am wondering whether you have encountered similar issues? Any hints would be appreciated. Thanks in advance.