YuliangXiu / ICON

[CVPR'22] ICON: Implicit Clothed humans Obtained from Normals
https://icon.is.tue.mpg.de
Other
1.61k stars 220 forks source link

Multiprocessing error in visibility phase #243

Open PawtingDev opened 10 months ago

PawtingDev commented 10 months ago

I followed instructions in dataset.md to process THuman2.0. Rendering phase works fine using python -m scripts.render_batch -debug -headless. However, running visibility phase using python -m scripts.visibility_batch -debug failed:

(dev) pawting@pc0809:/media/pawting/SN640/hello_worlds/ICON$ python -m scripts.visibility_batch_mod -debug
Start Visibility Computing thuman2 with 36 views.
Output dir: ./debug/thuman2_36views
  0%|                                                                                    | 0/2 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/media/pawting/SN640/hello_worlds/ICON/scripts/visibility_batch_mod.py", line 36, in visibility_subject
    smpl_verts = torch.from_numpy(rescale_fitted_body.vertices).to(device).float()
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/media/pawting/SN640/hello_worlds/ICON/scripts/visibility_batch_mod.py", line 122, in <module>
    for _ in tqdm(
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I'm not familiar with muiltiprocessing, maybe it's related to operating .to(device) in subprocesses?

I've tried suggestion here to force 'spawn' as start method, it wont work:

(dev) pawting@pc0809:/media/pawting/SN640/hello_worlds/ICON$ python -m scripts.visibility_batch -debug
Start Visibility Computing thuman2 with 36 views.
Output dir: ./debug/thuman2_36views
  0%|                                                                                    | 0/2 [00:06<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/media/pawting/SN640/hello_worlds/ICON/scripts/visibility_batch.py", line 25, in visibility_subject
    gpu_id = queue.get()
NameError: name 'queue' is not defined. Did you mean: 'Queue'?
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/media/pawting/SN640/hello_worlds/ICON/scripts/visibility_batch.py", line 97, in <module>
    for _ in tqdm(
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/home/pawting/anaconda3/envs/dev/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
NameError: name 'queue' is not defined
RichardChen20 commented 1 month ago

I also have the same problem. I failed to figure it out, so I just modify the code to process the data case by case using only one progress.

for sub in tqdm(subjects): visibility_subject( subject=sub, dataset=args.dataset, save_folder=current_out_dir, rotation=args.num_views, debug=args.debug, )