Project-MONAI / MONAILabel

MONAI Label is an intelligent open source image labeling and learning tool.
https://docs.monai.io/projects/label
Apache License 2.0
610 stars 196 forks source link

Multi-GPU on DeepEdit UNETR network #827

Closed guanjiahui closed 2 years ago

guanjiahui commented 2 years ago

The Multi-GPU stops working for DeepEdit on UNETR network. Errors below

/opt/conda/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown   warnings.warn('resource_tracker: There appear to be %d ' [2022-05-13 15:51:37,994] [33828] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:76) - Return code: -7

If we turn off multi-GPU (uncheck the multi-gpu on 3D slicer GUI), it works well on a single GPU.  If we use another model on DeepEdit with multi-GPU, it also works. 

SachidanandAlle commented 2 years ago

which version of monailabel r u using? this problem is solved only in current main branch/latest release candidate version (0.4.x) Ok.. i see u r talking about UNETR.. but DynUNET works fine for u.. that means.. u r possibly using latest version..

On UNETR.. @diazandr3s can provide some insights.. if multi-gpu is supported or not..

guanjiahui commented 2 years ago

DynUNET works fine on both single and multi-GPU. UNETR works fine on single GPU, but no multi-GPU. Please let us know if multi-gpu on UNETR is solved/supported in the latest version

diazandr3s commented 2 years ago

@guanjiahui could you please share the full log you get when running UNETR on multi GPU set-up? I just rechecked and it works fine on my 2 GPU set-up. Attached 1-minute video showing this

https://user-images.githubusercontent.com/11991079/172614784-872d948d-0182-468a-b521-0affca6d834f.mp4

.

guanjiahui commented 2 years ago

Please find the full log attached. @diazandr3s UNETR_multiGPU_full-error.txt

diazandr3s commented 2 years ago

Thanks, @guanjiahui Are you using conda to create the python virtual environment? It seems this issue is not related to MONAI/MONAI Label. I've checked a potential solution for this and found this: https://github.com/conda/conda/issues/9589#issuecomment-685332482

It'd be good to run the docker container and check whether your workstation presents the same issue. Here is how you run the docker image:

docker run --gpus all --rm -ti --ipc=host --net=host projectmonai/monailabel:latest bash

Hope that helps

SachidanandAlle commented 2 years ago

are you still facing this problem.. i guess this is more of env issue.. as multiple users can run the training on multi-gpu closing this issue for now.. feel free to reopen if the issue persists by providing more details/logs etc..