MIC-DKFZ / nnUNet

Apache License 2.0
5.79k stars 1.74k forks source link

Is there any valid solution? "Background workers died. #1572" #2123

Closed GabPete closed 6 months ago

GabPete commented 6 months ago

Hey, I saw this thread, but it is closed with no real solution to the problem. https://github.com/MIC-DKFZ/nnUNet/issues/1572 I'm facing the same issue.

Windows 11pro, RTX 4090 Ti. Intel 13900K. 32gb RAM of which usually ~22gb is free.

I haven't had any kind of issues with training on different sizes of 2D data, it's only when I try the inference (any of the mentioned ways). Tried on 448x448pix 5-fold CV, 224x224pix 5 or 1-fold CV and 100x100pix 5-fold CV and nothing.

Can you confirm that only solution to that problem is simply to expand my RAM? Should I buy another 32gb for inference to work? Or maybe there is something that can be done.

Raminmian commented 6 months ago

Hi there , I have had this issue before without finding other solutions when made the inference with larger images. I monitored the process and found that it is always stucked at the process after the prediction (transfer from GPU to CPU), converting the logits to segmentation and reshaping them leads to insufficient CPU memory. So I pay the cost to lower my precision from float64 to float32 in the preprocessing.default_resampling.resample_data_or_seg () ,it can work , but I did not test the effects (maybe very stupid choice )

GabPete commented 6 months ago

Hello Raminmian,

I cannot perform inference on any picture size whatsoever, so I believe it must be something else and not RAM. In any case if it is indeed RAM I don't even know if it's about normal RAM or VRAM, I suppose the former. I am no so advanced coder, so if you could be so kind to instruct me what should I try and where?

Anyways If I could do it sequentially and not simultaneously or if one could tell me that this is definitely the hardware, I would leave the topic, but for now I work with 100x100 not-so demanding pictures, abovementioned PC and it crashes.

GabPete commented 6 months ago

Update.

This is the whole error message, if it changes anything:

(NNU) C:\Users\Piotr>nnUNetv2_predict -i D:/nnUNet_raw_data_base/nnUNet_raw/Dataset003_Ankle/imagesTs -o imagesPr -d 003 -c 2d -f 5

####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

There are 1 cases in the source folder I am process 0 out of 1 (max process ID is 0, we start counting with 0!) There are 1 cases that I would like to predict Process SpawnProcess-6: Traceback (most recent call last): File "C:\Users\Piotr\miniconda3\envs\NNU\lib\multiprocessing\process.py", line 315, in _bootstrap self.run() File "C:\Users\Piotr\miniconda3\envs\NNU\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\Users\Piotr\nnUNet\nnunetv2\inference\data_iterators.py", line 58, in preprocess_fromfiles_save_to_queue raise e File "C:\Users\Piotr\nnUNet\nnunetv2\inference\data_iterators.py", line 31, in preprocess_fromfiles_save_to_queue data, seg, data_properties = preprocessor.run_case(list_of_lists[idx], File "C:\Users\Piotr\nnUNet\nnunetv2\preprocessing\preprocessors\default_preprocessor.py", line 139, in run_case data, seg = self.run_case_npy(data, seg, data_properties, plans_manager, configuration_manager, File "C:\Users\Piotr\nnUNet\nnunetv2\preprocessing\preprocessors\default_preprocessor.py", line 78, in run_case_npy data = self._normalize(data, seg, configuration_manager, File "C:\Users\Piotr\nnUNet\nnunetv2\preprocessing\preprocessors\default_preprocessor.py", line 183, in _normalize scheme = configuration_manager.normalization_schemes[c] IndexError: list index out of range Traceback (most recent call last): File "C:\Users\Piotr\miniconda3\envs\NNU\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Piotr\miniconda3\envs\NNU\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\Piotr\miniconda3\envs\NNU\Scripts\nnUNetv2_predict.exe__main__.py", line 7, in sys.exit(predict_entry_point()) File "C:\Users\Piotr\nnUNet\nnunetv2\inference\predict_from_raw_data.py", line 864, in predict_entry_point predictor.predict_from_files(args.i, args.o, save_probabilities=args.save_probabilities, File "C:\Users\Piotr\nnUNet\nnunetv2\inference\predict_from_raw_data.py", line 256, in predict_from_files return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export) File "C:\Users\Piotr\nnUNet\nnunetv2\inference\predict_from_raw_data.py", line 349, in predict_from_data_iterator for preprocessed in data_iterator: File "C:\Users\Piotr\nnUNet\nnunetv2\inference\data_iterators.py", line 111, in preprocessing_iterator_fromfiles raise RuntimeError('Background workers died. Look for the error message further up! If there is ' RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!

GabPete commented 6 months ago

IndexError: list index out of range - this part of code suggested that channels from the pictures that were fed into the prediction part didn't match expectations. I tried prediction on one of the training images and it worked, then I prepared the database for prediction exactly the same as for training (except for masks) and it worked.

Problem solved, it works easily on 1000x1000x3 images (2.3.1 version). I suppose there could be some kind of update that informs user that pictures for prediction have to perfectly match those for training, not only in names but also in formatting.