MIC-DKFZ / basic_unet_example

An example project of how to use a U-Net for segmentation on medical images with PyTorch.
Apache License 2.0
140 stars 38 forks source link

MultiThreadedDataLoader problem on Windows #3

Closed vcvishal closed 5 years ago

vcvishal commented 5 years ago

when i run python3 run_train_pipeline.py this error happened

AttributeError: Can't pickle local object 'MultiThreadedDataLoader.get_worker_init_fn..init_fn' Traceback (most recent call last): File "", line 1, in File "D:\miniconda\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\miniconda\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

elpequeno commented 5 years ago

Hi vcvishal,

which OS are you working on? that looks like a Windows error to me. Multithreaded dataloading is not supported by the version of batchgenerators used in the example. You can try to set the number of threads to 1 or update to the latest version of batchgenerators. I think they put a lot of work to make everything smoother on windows. I will test that and update this repo soon.

Let me know if that helps.

Cheers, André

vcvishal commented 5 years ago

i am using windows which code i should modify to prevent Multi threaded dataloading?

elpequeno commented 5 years ago

Try to set num_processes=0 in datasets/two_dim/NumpyDataLoader line 56. This will probably make the training a bit slower but should prevent you from ending up with the error mentioned above. This is at the moment just a hack. I'll try to come up with a better solution soon-ish.

vcvishal commented 5 years ago

i did but this time Traceback (most recent call last): File "run_train_pipeline.py", line 51, in exp.run() File "D:\miniconda\lib\site-packages\trixi\experiment\experiment.py", line 103, in run raise e File "D:\miniconda\lib\site-packages\trixi\experiment\experiment.py", line 80, in run self.train(epoch=epoch) File "C:\Users\vcvis\Desktop\basic_unet_example-master\experiments\UNetExperiment.py", line 112, in train loss = self.dice_loss(pred_softmax, target.squeeze()) + self.ce_loss(pred, target.squeeze()) File "D:\miniconda\lib\site-packages\torch\nn\modules\module.py", line 489, in call result = self.forward(*input, **kwargs) File "C:\Users\vcvis\Desktop\basic_unet_example-master\loss_functions\dice_loss.py", line 125, in forward yonehot.scatter(1, y, 1) RuntimeError: invalid argument 3: Index tensor must be either empty or have same dimensions as output tensor at c:\a\w\1\s\tmp_conda_3.6_104352\conda\conda-bld\pytorch_1550400396997\work\aten\src\thc\generic/THCTensorScatterGather.cu:314

vcvishal commented 5 years ago

i reduced the batch size to 1 due to insufficient memory

vcvishal commented 5 years ago

ok after a lots of struggle it run but with negative loss why Epoch: 0 Loss: -0.7811 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7811 Epoch: 0 Loss: -0.7853 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7853 Epoch: 0 Loss: -0.6754 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.6754 Epoch: 0 Loss: -0.7570 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7570 Epoch: 0 Loss: -0.7717 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7717 Epoch: 0 Loss: -0.6727 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.6727 Epoch: 0 Loss: -0.7477 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7477 Epoch: 0 Loss: -0.6968 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.6968 Epoch: 0 Loss: -0.7585 INFO:default-2Srg7QKqj6:Epoch: 0 Loss: -0.7585

JunMa11 commented 5 years ago

Hi, @vcvishal , You can find the explanation by @FabianIsensee here.

vcvishal commented 5 years ago

thank you

elpequeno commented 5 years ago

Thank you @JunMa11 for providing that Link.

@vcvishal: Would you consider your issue "solved" right now or is there any open questions? If not, I would close this issue.

vcvishal commented 5 years ago

thank you problem solved