YuliangXiu / ICON

[CVPR'22] ICON: Implicit Clothed humans Obtained from Normals
https://icon.is.tue.mpg.de
Other
1.59k stars 217 forks source link

multiprocessing.pool.MaybeEncodingError #193

Open glorioushonor opened 1 year ago

glorioushonor commented 1 year ago

Good job! When I run the script python -m scripts.render_batch -debug -headless, I got error as follows:

Start Rendering thuman2 with 36 views, 512x512 size.
Output dir: ./debug/thuman2_36views
Rendering types: ['light', 'normal', 'depth']
  0%|                                                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/vision/linjie/ICON/scripts/render_batch.py", line 254, in <module>
    for _ in tqdm(
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f5f10a6ed90>'. Reason: 'ValueError('ctypes objects containing pointers cannot be pickled')'

I think it has to do with the number of Gpus, you used two Gpus, I am using a four Gpus server, but I can only use the number 3 GPU, I really don't know how to change it.

MalignusCN commented 1 year ago

Good job! When I run the script python -m scripts.render_batch -debug -headless, I got error as follows:

Start Rendering thuman2 with 36 views, 512x512 size.
Output dir: ./debug/thuman2_36views
Rendering types: ['light', 'normal', 'depth']
  0%|                                                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/vision/linjie/ICON/scripts/render_batch.py", line 254, in <module>
    for _ in tqdm(
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f5f10a6ed90>'. Reason: 'ValueError('ctypes objects containing pointers cannot be pickled')'

I think it has to do with the number of Gpus, you used two Gpus, I am using a four Gpus server, but I can only use the number 3 GPU, I really don't know how to change it.

Maybe you can't see the bug with multiprocessing, you can first hacked the render_batch without any multi processing, just execute render_subject, to find the bug

lucas-jay commented 1 year ago

Hello, I have encountered a similar problem before, and I resolved it by setting rs_rate in render_bash.py to 1.0

glorioushonor commented 1 year ago

您好,我以前遇到过类似的问题,我通过将render_bash.py设置为 1.0 来解决它rs_rate

Thanks for your kind suggestion, but it didn‘t work to me.

lucas-jay commented 1 year ago

Or maybe you can try temporarily modify multi process part to single to find bug, like:

image
glorioushonor commented 1 year ago

Or maybe you can try temporarily modify multi process part to single to find bug, like: image

Thank you very much. Indeed, you are right. Other errors may have led to the error of multiprocessing. At present, the following error has occurred. I have searched for the error by google, but it has not been solved. I'll keep exploring until I figure it out.

Start Rendering thuman2 with 36 views, 512x512 size.
Output dir: ./debug/thuman2_36views
Rendering types: ['light', 'normal', 'depth']
  0%|                                                                                                                                | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/vision/linjie/ICON copy 2/scripts/render_batch.py", line 254, in <module>
    for _ in tqdm(
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/media/vision/linjie/ICON copy 2/scripts/render_batch.py", line 57, in render_subject
    initialize_GL_context(width=size, height=size, egl=egl)
  File "/media/vision/linjie/ICON copy 2/lib/renderer/gl/init_gl.py", line 23, in initialize_GL_context
    create_opengl_context((width, height))
  File "/media/vision/linjie/ICON copy 2/lib/renderer/gl/glcontext.py", line 89, in create_opengl_context
    egl_display = egl.eglGetDisplay(egl.EGL_DEFAULT_DISPLAY)
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 402, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.error.GLError: GLError(
        err = 12300,
        baseOperation = eglGetDisplay,
        cArguments = (
                <OpenGL._opaque.EGLNativeDisplayType_pointer object at 0x7fd5b9323240>,
        ),
        result = <OpenGL._opaque.EGLDisplay_pointer object at 0x7fd5b9323140>
)
glorioushonor commented 1 year ago

Or maybe you can try temporarily modify multi process part to single to find bug, like: image

Hi. I'm sorry to bother you. I have got the reproduction result, but the data provided by the author has expired. May I ask if we can exchange the quantitative experimental results and other implementation details? Or if you allow me, I would like to communicate with you through email or add your QQ friends for further communication. Thank you.

xiaoniujz commented 1 year ago

Or maybe you can try temporarily modify multi process part to single to find bug, like: image

Hi. I'm sorry to bother you. I have got the reproduction result, but the data provided by the author has expired. May I ask if we can exchange the quantitative experimental results and other implementation details? Or if you allow me, I would like to communicate with you through email or add your QQ friends for further communication. Thank you.

I also encounter this question. I reinstalled my Nvidia Driver to Nvidia Display Driver, and the error disappear.

msverma101 commented 1 year ago

i tried to use the single gpu as mentioned but when i see the gpu usage it says its 0 percentage

LanguageDriven3DPoseEstimation commented 1 month ago

Or maybe you can try temporarily modify multi process part to single to find bug, like: image

Thank you very much. Indeed, you are right. Other errors may have led to the error of multiprocessing. At present, the following error has occurred. I have searched for the error by google, but it has not been solved. I'll keep exploring until I figure it out.

Start Rendering thuman2 with 36 views, 512x512 size.
Output dir: ./debug/thuman2_36views
Rendering types: ['light', 'normal', 'depth']
  0%|                                                                                                                                | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/vision/linjie/ICON copy 2/scripts/render_batch.py", line 254, in <module>
    for _ in tqdm(
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/media/vision/linjie/ICON copy 2/scripts/render_batch.py", line 57, in render_subject
    initialize_GL_context(width=size, height=size, egl=egl)
  File "/media/vision/linjie/ICON copy 2/lib/renderer/gl/init_gl.py", line 23, in initialize_GL_context
    create_opengl_context((width, height))
  File "/media/vision/linjie/ICON copy 2/lib/renderer/gl/glcontext.py", line 89, in create_opengl_context
    egl_display = egl.eglGetDisplay(egl.EGL_DEFAULT_DISPLAY)
  File "/media/vision/linjie/.conda/envs/ICON/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 402, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.error.GLError: GLError(
        err = 12300,
        baseOperation = eglGetDisplay,
        cArguments = (
                <OpenGL._opaque.EGLNativeDisplayType_pointer object at 0x7fd5b9323240>,
        ),
        result = <OpenGL._opaque.EGLDisplay_pointer object at 0x7fd5b9323140>
)

@glorioushonor Could you tell me how you solve this problem? I get the same error.