MIC-DKFZ / nnUNet

Apache License 2.0
5.73k stars 1.73k forks source link

When I use predict, at the end of the run, the result is not saved, it is stuck, and the CPU usage is almost zero #1450

Closed LReion closed 1 year ago

LReion commented 1 year ago

Some prediction results are saved, some are not, until the GPU is processed, the CPU occupancy rate is 0, I feel that it is stuck.

aspil commented 1 year ago

It is quite a common issue (raised a similar one 1448) not directly related to nnUNet. The problem is that some subprocess makes your RAM becomes full at some point and the operating system decides to kill this subprocess. So you have the main process wait for a dead child indefinitely.

LReion commented 1 year ago

This problem occurs whenever I decrease or increase the number of processes, is there any solution?

aspil commented 1 year ago

Try running with one process only. What's your image size? My guess is that it doesn't fit in the memory during resampling (it's the most memory-consuming task). You can try to run everything in the main process sequentially instead of creating several workers. I haven't delved too much into it, but I also think that the resampling function consumes more memory than it should.

LReion commented 1 year ago

My image size is 512*512, and then some results are saved and some are not

aspil commented 1 year ago

It seems weird for the program to hang on 2D images. Try to monitor the RAM usage during the execution. Is it filling up? Also, how much do you have in total?

LReion commented 1 year ago

I used a 3D image because I don't remember how many slicers, the memory is 32g, the CPU is 16 cores, but there will be no saved results in the prediction. image image

I inference 16, and only 11 results were saved, and the remaining use-continue_predict still couldn't complete the remaining five

aspil commented 1 year ago

Perhaps those last 5 cases cannot fit in the memory when converting the prediction to segmentation, as they run concurrently using starmap_async (I would link the specific line but I don't know which version of nnunet you're using).

As I mentioned earlier, instead of using the starmap_async function, you call the method inside it as you would normally do. The predictions will be processed one by one and it'll be easier for you to see which ones require lots of memory.

For reference, nnU-Net first resamples the network's prediction (array with floats) and then converts it to segmentation array, which is pretty memory intensive, especially for the 3d_fullres configuration!

LReion commented 1 year ago

What confuses me the most is that using multiple threads only counts a fixed 11 results, and using a single thread doesn't solve the problem, but it does print out the done with this image

aspil commented 1 year ago

Perhaps your images are too large? Can't tell without any info. If I remember correctly, when I tested an image with 1000+ sizes it consumed around 40GB at some point. If you can't find any way to solve this, you can try to predict your images in parts and them combine them, just as TotalSegmentator does.

LReion commented 1 year ago

I will try it, thanks