facebookresearch / ContrastiveSceneContexts

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"
MIT License
218 stars 29 forks source link

multiprocessing and spawn #32

Closed JudyYe closed 2 years ago

JudyYe commented 2 years ago

Hi,

Thank you for open-sourcing your work! It is really neat!

However, I have trouble launching your jobs.

I have to set start method to "spawn" in order to run the launch.sh (torch.multiprocess.set_start_method('spawn')) . Otherwise I got this error:

RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/THC/THCGeneral.cpp:47

or

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

However, if I do this, I got error in multiprocess_utils about pickle function

_pickle.PicklingError: Can't pickle <function single_proc_run at 0x7fcd513d9170>: attribute lookup single_proc_run on __main__ failed

I checked that this suggests I should use 'fork' instead of 'spawn'

I am using pytorch 1.5.1 (py3.7_cuda10.2.89_cudnn7.6.5_0), and hydra version:

hydra-colorlog            1.0.0                    pypi_0    pypi
hydra-core                1.0.0                    pypi_0    pypi
hydra-submitit-launcher   1.1.0                    pypi_0    pypi

I wonder if you have any idea on how to correctly launch your job?

Thank you!

Sekunde commented 2 years ago

Hi, thanks for your interests on our work!

I have met similar issues some time ago in some environments, but also not exactly sure why this happened. It seems to be some versions of open3d have conflicts with multiprocessing. And it should have nothing to do with hydra.

My assumption is that the error is due to the re-initialisation of cuda in open3d and multiprocessing. If I remember correctly, you can put this torch.multiprocess.set_start_method('spawn') right after first time multiprocessing and torch are imported and more importantly before the open3d is imported.

Please let met know if you still have this error when putting torch.multiprocess.set_start_method('spawn') before open3d is imported.

JudyYe commented 2 years ago

Hi Ji,

Thank you for your prompt reply. After I added set start method spawn, I managed to initialize cuda. However, I still cannot run it and got the following pickle error:

Traceback (most recent call last):
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/submitit/core/submission.py", line 54, in process_job
    result = delayed.result()
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/submitit/core/utils.py", line 122, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/hydra_plugins/hydra_submitit_launcher/submitit_launcher.py", line 85, in __call__
    job_subdir_key="hydra.sweep.subdir",
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/site-packages/hydra/core/utils.py", line 125, in run_job
    ret.return_value = task_function(task_cfg)
  File "foo.py", line 21, in main
    mpu.multi_proc_run(2, fun=run, fun_args=('shhae'))
  File "/private/home/yufeiy2/ssl3d/PointContrast/pretrain/pointcontrast/lib/multiprocessing.py", line 52, in multi_proc_run
    p_i.start()
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/private/home/yufeiy2/.conda/envs/pc377/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function run at 0x7f23193e6710>: attribute lookup run on __main__ failed
JudyYe commented 2 years ago

Hi Ji,

I solve this problem by moving the function run outside of main file. The error seems about submitit but I do not inspect further.

Sekunde commented 2 years ago

Hi, thanks for sharing and it is nice to hear it!