Closed goodsave closed 12 months ago
Check https://github.com/pytorch/pytorch#docker-image. Maybe your docker image has limited shared memory.
thank u~ I am aware of my issue (caused by NFS file system) and I have resolved it. thank u for your help.
Prediction of getting stuck when encountering large files
When I execute nnUNet on the host host host_ When using the predict command, although encountering a large file may take a bit longer, it can ultimately be successfully executed and output a nifty file.
But when I execute nnUNet in the Docker container_ When using the predict command, it stops moving after printing "separate z: False losses axis None". I checked and the CPU occupied by the Docker container at this time was not occupied. As shown in the following figure:
When I force the process to end, the printed information is as follows:
It should be stuck in this sentence: “[i.get() for i in results]”
The corresponding code should be this paragraph:
` bytes_per_voxel = 4 if all_in_gpu: bytes_per_voxel = 2 # if all_in_gpu then the return value is half (float16) if np.prod(softmax.shape) > (2e9 / bytes_per_voxel 0.85): # 0.85 just to be save print( "This output is too large for python process-process communication. Saving output temporarily to disk") np.save(output_filename[:-7] + ".npy", softmax) softmax = output_filename[:-7] + ".npy"
The complete log is as follows:
Please cite the following paper when using nnUNet:
Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet
using model stored in /home/allen/nnUNet/RESULTS_FOLDER/nnUNet/3d_fullres/Task301_HeadNeckOAR_20210722/nnUNetTrainerV2__nnUNetPlansv2.1 This model expects 1 input modalities for each image Found 1 unique case ids, here are some examples: ['DATA'] If they don't look right, make sure to double check your filenames. They must end with _0000.nii.gz etc number of cases: 1 number of cases that still need to be predicted: 1 emptying cuda cache loading parameters for folds, ['all'] 2023-12-04 15:53:28.348381: Using dummy2d data augmentation using the following model files: ['/home/allen/nnUNet/RESULTS_FOLDER/nnUNet/3d_fullres/Task301_HeadNeckOAR_20210722/nnUNetTrainerV2__nnUNetPlansv2.1/all/model_final_checkpoint.model'] starting preprocessing generator starting prediction... preprocessing /home/allen/DATA.nii.gz using preprocessor GenericPreprocessor before crop: (1, 195, 512, 512) after crop: (1, 195, 512, 512) spacing: [2.50000095 1.08398402 1.08398402]
no separate z, order 3 no separate z, order 1 before: {'spacing': array([2.50000095, 1.08398402, 1.08398402]), 'spacing_transposed': array([2.50000095, 1.08398402, 1.08398402]), 'data.shape (data is transposed)': (1, 195, 512, 512)} after: {'spacing': array([3. , 1.16308594, 1.16308594]), 'data.shape (data is resampled)': (1, 163, 477, 477)}
(1, 163, 477, 477) This worker has ended successfully, no errors to report predicting /home/allen/DATA.nii.gz debug: mirroring True mirror_axes (0, 1, 2) step_size: 0.5 do mirror: True data shape: (1, 163, 477, 477) patch size: [ 48 192 192] steps (x, y, and z): [[0, 23, 46, 69, 92, 115], [0, 95, 190, 285], [0, 95, 190, 285]] number of tiles: 96 computing Gaussian prediction done This output is too large for python process-process communication. Saving output temporarily to disk inference done. Now waiting for the segmentation export to finish... force_separate_z: None interpolation order: 1 separate z: False lowres axis None no separate z, order 1 ^CProcess ForkPoolWorker-4: Process ForkPoolWorker-2: Traceback (most recent call last): File "/usr/local/bin/nnUNet_predict", line 33, in
sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_predict')())
File "/code/docker/nnUNet-master/nnunet/inference/predict_simple.py", line 221, in main
step_size=step_size, checkpoint_name=args.chk)
File "/code/docker/nnUNet-master/nnunet/inference/predict.py", line 664, in predict_from_folder
disable_postprocessing=disable_postprocessing)
File "/code/docker/nnUNet-master/nnunet/inference/predict.py", line 269, in predictcases
= [i.get() for i in results]
File "/code/docker/nnUNet-master/nnunet/inference/predict.py", line 269, in
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, *self.kwargs)
= [i.get() for i in results]
File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
with self._rlock:
File "/usr/lib/python3.6/multiprocessing/pool.py", line 638, in get
File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
KeyboardInterrupt
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
res = self._reader.recv_bytes()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
self.wait(timeout)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 635, in wait
self._event.wait(timeout)
File "/usr/lib/python3.6/threading.py", line 551, in wait
signaled = self._cond.wait(timeout)
File "/usr/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()
KeyboardInterrupt
2
May I ask if you can help solve this problem?