MIC-DKFZ / nnUNet

Apache License 2.0
5.79k stars 1.74k forks source link

Pickling error during nnUNet_predict in multiprocessing module (under Windows OS) #589

Closed marcus-wirtz-snkeos closed 3 years ago

marcus-wirtz-snkeos commented 3 years ago

I am running nnUNet_predict with standard nnUNetTrainerV2 and 3d_fullres model. All data from the input keyword "-i" apparently gets interpreted correctly but there occurs a weird PicklingError in the multiprocessing module where its failing to pickle the lambda function in nnunet.utilities.nd_softmax:

Traceback (most recent call last):
File "nnunet/inference/predict_simple.py", line 225, in <module>
    main()
File "nnunet/inference/predict_simple.py", line 217, in main
    predict_from_folder(model_folder_name, input_folder, output_folder, folds, save_npz, num_threads_preprocessing,
File "C:\Users\marcus.wirtz\VEnvs\nnUnet\lib\site-packages\nnunet\inference\predict.py", line 631, in predict_from_folder
    return predict_cases(model, list_of_lists[part_id::num_parts], output_files[part_id::num_parts], folds,
File "C:\Users\marcus.wirtz\VEnvs\nnUnet\lib\site-packages\nnunet\inference\predict.py", line 204, in predict_cases
    for preprocessed in preprocessing:
File "C:\Users\marcus.wirtz\VEnvs\nnUnet\lib\site-packages\nnunet\inference\predict.py", line 109, in preprocess_multithreaded
    pr.start()
File "c:\programdata\miniconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
File "c:\programdata\miniconda3\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
File "c:\programdata\miniconda3\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
File "c:\programdata\miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
File "c:\programdata\miniconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x000001B2424243A0>: attribute lookup <lambda> on nnunet.utilities.nd_softmax failed
[W ..\torch\csrc\CudaIPCTypes.cpp:21] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

It might well be that this is a Windows related issue since I found a similar issue on Stackoverflow: https://stackoverflow.com/questions/64347217/error-pickle-picklingerror-cant-pickle-function-lambda-at-0x0000002f2175b

In this case feel free to close the issue. Nevertheless, I thought maybe I am lucky and you know what is going on here :-)

Cheers, Marcus

WYUF commented 3 years ago

@MRCWirtz I suggest that you should run the framework on an ubuntu server or pc.

marcus-wirtz-snkeos commented 3 years ago

@WYUF Thanks for the recommendation, but I am trying to implement full Windows support (see open pull request #553 )

FabianIsensee commented 3 years ago

Hi there, sorry I am super late to the party. Too many things to do. I have not seen this error so far. Ever. So it probably is related to Windows. Pickling stuff for multiprocessing is more tricky in Windows than it is on Linux if I remember correctly. Best, Fabian

marcus-wirtz-snkeos commented 3 years ago

Hi @FabianIsensee,

OK, thanks for confirming that it indeed never occurred before within nnUNet. I already figured the reason why its failing: basically in contrast to Linux, Windows is transferring the data to the child processes via pickling. I found now a solution that is working with pathos (https://github.com/uqfoundation/pathos). Regarding the open pull request for Windows support, it is the question if Windows support is worth for you to have another dependency. Alternatively, one may only use Pathos if the operating system is Windows.

Any opinions about that?

I'll update the Pull request during the next days.

Best, Marcus

marcus-wirtz-snkeos commented 3 years ago

In case you are interested, here some background info from the pathos author (https://stackoverflow.com/questions/31732989/multiprocessing-pathos-multiprocessing-and-windows): "For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer."

PS: definitely the pathos module should just be used if nnUNet is executed from Windows, since it generally slows down the multiprocessing due to overhead of additional imports (see e.g. https://github.com/uqfoundation/pathos/issues/79).

marcus-wirtz-snkeos commented 3 years ago

Hi @FabianIsensee, I updated the pull request #553 which fixes the pickling error issue under Windows 10.

thtranos commented 9 months ago

Greetings! I'm using nnUNet for brain WMH on mac and I get this error,

I've not dived deep into the code but it must be a cross OS problem or process problem as you mention under the hood of this issue page. I would appreciate any recommendation as of how I could resolve this problem!

Traceback (most recent call last):
  File "/Users/theodoretranos/anaconda3/envs/brain2/bin/nnUNet_predict", line 33, in <module>
    sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_predict')())
  File "/Users/theodoretranos/Desktop/DeepWMH/nnUNet_for_DeepWMH-develop/nnunet/inference/predict_simple.py", line 214, in main
    predict_from_folder(model_folder_name, input_folder, output_folder, folds, save_npz, num_threads_preprocessing,
  File "/Users/theodoretranos/Desktop/DeepWMH/nnUNet_for_DeepWMH-develop/nnunet/inference/predict.py", line 412, in predict_from_folder
    return predict_cases(model, list_of_lists[part_id::num_parts], output_files[part_id::num_parts], folds,
  File "/Users/theodoretranos/Desktop/DeepWMH/nnUNet_for_DeepWMH-develop/nnunet/inference/predict.py", line 211, in predict_cases
    for preprocessed in preprocessing:
  File "/Users/theodoretranos/Desktop/DeepWMH/nnUNet_for_DeepWMH-develop/nnunet/inference/predict.py", line 112, in preprocess_multithreaded
    pr.start()
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/theodoretranos/anaconda3/envs/brain2/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Generic_UNet.__init__.<locals>.<lambda>'

Regards,

Theodore Tranos

ancestor-mithril commented 9 months ago

If you are able to modify the source code, just search for the implementation of Generic_UNet and replace any lambda with a python function visible globally.

thtranos commented 9 months ago

If you are able to modify the source code, just search for the implementation of Generic_UNet and replace any lambda with a python function visible globally.

Thanks for the quick reply!

in the generic_UNet.py i searched for lamda and the only spot that contains a lambda is this

self.upscale_logits_ops.append(lambda x: x)

So you are suggesting modifying this code to do the same thing without lamda or did I misunderstood something?

Theodore Tranos

ancestor-mithril commented 9 months ago

Yes. Replace it with self.upscale_logits_ops.append(identity), where identity is

def identity(x):
    return x

You can also use self.upscale_logits_ops.append(torch.nn.Identity()) as it is already implemented in PyTorch.

thtranos commented 9 months ago

@ancestor-mithril I think that change solved the problem. Thanks alot!

Theodore Tranos