Closed guanjiahui closed 2 years ago
which version of monailabel r u using? this problem is solved only in current main branch/latest release candidate version (0.4.x) Ok.. i see u r talking about UNETR.. but DynUNET works fine for u.. that means.. u r possibly using latest version..
On UNETR.. @diazandr3s can provide some insights.. if multi-gpu is supported or not..
DynUNET works fine on both single and multi-GPU. UNETR works fine on single GPU, but no multi-GPU. Please let us know if multi-gpu on UNETR is solved/supported in the latest version
@guanjiahui could you please share the full log you get when running UNETR on multi GPU set-up? I just rechecked and it works fine on my 2 GPU set-up. Attached 1-minute video showing this
.
Please find the full log attached. @diazandr3s UNETR_multiGPU_full-error.txt
Thanks, @guanjiahui Are you using conda to create the python virtual environment? It seems this issue is not related to MONAI/MONAI Label. I've checked a potential solution for this and found this: https://github.com/conda/conda/issues/9589#issuecomment-685332482
It'd be good to run the docker container and check whether your workstation presents the same issue. Here is how you run the docker image:
docker run --gpus all --rm -ti --ipc=host --net=host projectmonai/monailabel:latest bash
Hope that helps
are you still facing this problem.. i guess this is more of env issue.. as multiple users can run the training on multi-gpu closing this issue for now.. feel free to reopen if the issue persists by providing more details/logs etc..
The Multi-GPU stops working for DeepEdit on UNETR network. Errors below
/opt/conda/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' [2022-05-13 15:51:37,994] [33828] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:76) - Return code: -7
If we turn off multi-GPU (uncheck the multi-gpu on 3D slicer GUI), it works well on a single GPU. If we use another model on DeepEdit with multi-GPU, it also works.