Closed JudyYe closed 2 years ago
Hi, thanks for your interests on our work!
I have met similar issues some time ago in some environments, but also not exactly sure why this happened. It seems to be some versions of open3d have conflicts with multiprocessing. And it should have nothing to do with hydra.
My assumption is that the error is due to the re-initialisation of cuda in open3d and multiprocessing. If I remember correctly, you can put this torch.multiprocess.set_start_method('spawn') right after first time multiprocessing and torch are imported and more importantly before the open3d is imported.
Please let met know if you still have this error when putting torch.multiprocess.set_start_method('spawn') before open3d is imported.
Hi Ji,
Thank you for your prompt reply. After I added set start method spawn, I managed to initialize cuda. However, I still cannot run it and got the following pickle error:
Traceback (most recent call last):
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/submitit/core/submission.py", line 54, in process_job
result = delayed.result()
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/submitit/core/utils.py", line 122, in result
self._result = self.function(*self.args, **self.kwargs)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/hydra_plugins/hydra_submitit_launcher/submitit_launcher.py", line 85, in __call__
job_subdir_key="hydra.sweep.subdir",
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/hydra/core/utils.py", line 125, in run_job
ret.return_value = task_function(task_cfg)
File "foo.py", line 21, in main
mpu.multi_proc_run(2, fun=run, fun_args=('shhae'))
File "/private/home/yufeiy2/ssl3d/PointContrast/pretrain/pointcontrast/lib/multiprocessing.py", line 52, in multi_proc_run
p_i.start()
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function run at 0x7f23193e6710>: attribute lookup run on __main__ failed
Hi Ji,
I solve this problem by moving the function run
outside of main file. The error seems about submitit but I do not inspect further.
Hi, thanks for sharing and it is nice to hear it!
Hi,
Thank you for open-sourcing your work! It is really neat!
However, I have trouble launching your jobs.
I have to set start method to "spawn" in order to run the launch.sh (
torch.multiprocess.set_start_method('spawn')
) . Otherwise I got this error:or
However, if I do this, I got error in multiprocess_utils about pickle function
I checked that this suggests I should use 'fork' instead of 'spawn'
I am using pytorch 1.5.1 (py3.7_cuda10.2.89_cudnn7.6.5_0), and hydra version:
I wonder if you have any idea on how to correctly launch your job?
Thank you!