MIC-DKFZ / nnUNet

Apache License 2.0
5.91k stars 1.76k forks source link

The program gets stuck in the CPU during prediction and cannot produce a result. #2437

Open YUjh0729 opened 3 months ago

YUjh0729 commented 3 months ago
          Hello @Karol-G ,

I encountered a very strange issue when using the nnUNetv2_predict command. The program can't proceed and is unable to output the prediction results. These are the results I predicted on the cloud server, `Predicting FLARE22_010: perform_everything_on_device: True 0%| | 0/360 [00:00<?, ?it/s]resizing data, order is 3 data shape (1, 227, 512, 512) 11%|████████████████████▍ | 38/360 [00:05<00:48, 6.65it/s]resizing segmentation, order is 1 order z is 0 data shape (1, 227, 512, 512) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [00:54<00:00, 6.60it/s] sending off prediction to background worker for resampling and export done with FLARE22_010

Predicting FLARE22_011: perform_everything_on_device: True 38%|██████████████████████████████████████████████████████████████████████████▊ | 23/60 [00:03<00:05, 6.61it/s]resizing data, order is 1 data shape (14, 250, 628, 628) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:08<00:00, 6.74it/s] sending off prediction to background worker for resampling and export done with FLARE22_011 resizing data, order is 1 data shape (14, 109, 430, 430)and these are the results I tested locally,The output is similar, but there are these two additional lines of output. Both environments are identical:torch2.0.1,cudu11.8,python3.10perform_everything_on_device: True Prediction on device was unsuccessful, probably due to a lack of memory. Moving results arrays to CPU`

Snipaste_2024-08-07_10-27-05

The CPU and GPU are no longer occupied, and the prediction results should already be in the CPU memory, but they cannot be exported to the output folder, causing the program to freeze and become stuck. This only happens in a few cases with larger data in Abdomen CT_3D. How do you predict larger data in Abdomen CT_3D?

Originally posted by @YUjh0729 in https://github.com/MIC-DKFZ/nnUNet/issues/2091#issuecomment-2273776717

YUjh0729 commented 3 months ago

To add, when predicting multiple datasets simultaneously, smaller datasets (FLARE22_013, FLARE22_014) are able to output results normally. However, larger files (FLARE22_010) cause the program to fail to produce output and do not terminate properly.

01 02
constantinulrich commented 2 months ago

Hi, that happens when you resample the output probabilities back to the image resolution (especially when you have many classes). Your system does not have enough RAM.

pooya-mohammadi commented 2 months ago

Is there any way to prevent nnunet from sending export to background?

pooya-mohammadi commented 2 months ago

Setting the following parameters fixed the problem for me:

-nps 1 -npp 1
YUjh0729 commented 2 months ago

Hi, that happens when you resample the output probabilities back to the image resolution (especially when you have many classes). Your system does not have enough RAM.

Hi, @constantinulrich , Yes, I guess the root cause is insufficient RAM. I've tried all the solutions from the issues, but none of them resolved the issue.Do I need to upgrade my current device to solve this problem?