MIC-DKFZ / nnUNet

Apache License 2.0
5.78k stars 1.73k forks source link

AttributeError: 'NoneType' object has no attribute 'is_alive' #320

Closed leetesua closed 4 years ago

leetesua commented 4 years ago

want to train my own data, which is called RibFrac dataset, got:AttributeError: 'NoneType' object has no attribute 'is_alive'

First I run : pip install --upgrade nnunet, then rerun plan_and_preprocess_task.py, Then I run: OMP_NUM_THREADS=1 python run/run_training.py 3d_fullres ...... Got this error : TypeError: expected str, bytes or os.PathLike object, not NoneType

looks like the machine is running code in python3.7/site-packages/nnunet instead of where I git cloned it.

then I copied the code into site-packages and rerun python run/run_training.py 3d_fullres ...... Got this error : AttributeError: 'NoneType' object has no attribute 'is_alive'

Also got another issue: AssertionError: The NVIDIA driver on your system is too old (found version 10000).

But strangely, I was training LiTS dataset few days ago, it is 100% OK. How comes that today it doesn't work.

scratching my head now....

FabianIsensee commented 4 years ago

Hi, I have done an update recently that requires pytorch 1.6.0. It could be that by upgrading you also have upgraded pytorch and that the automatically installed pytorch version is incompatible with your driver because it was built with a newer version of CUDA. Please upgrade your graphics driver or downgrade nnU-Net. When posting error messages, please be sure to post the entire message, not just the end. The actual error is most often way up. Ideally you send the entire stdout from start to error ;-) Best, Fabian

leetesua commented 4 years ago

first pip install --upgrade nnunet, and rerun training, got this: (py37) lidexuan@SF-BS-13:/data3/lidexuan/nnUNet/nnunet$ OMP_NUM_THREADS=1 python run/run_training.py 3d_fullres nnUNetTrainer ribfrac 4 --ndet

Please cite the following paper when using nnUNet: Fabian Isensee, Paul F. J盲ger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020). If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

nnUNet_raw_data_base is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read nnunet/paths.md for information on how to set this up properly. nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read nnunet/pathy.md for information on how to set this up. RESULTS_FOLDER is not defined and nnU-Net cannot be used for training or inference. If this is not intended behavior, please read nnunet/paths.md for information on how to set this up Traceback (most recent call last): File "run/run_training.py", line 83, in trainer_class = get_default_configuration(network, task, network_trainer, plans_identifier) File "/home/lidexuan/.local/lib/python3.7/site-packages/nnunet/run/default_configuration.py", line 40, in get_default_configuration dataset_directory = join(preprocessing_output_dir, task) File "/root/anaconda3/envs/py37/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

then I downgrade nnunet by run pip install -e .

then rerun training, then got this: (py37) lidexuan@SF-BS-13:/data3/lidexuan/nnUNet/nnunet$ OMP_NUM_THREADS=1 python run/run_training.py 3d_fullres nnUNetTrainer ribfrac 4 --ndet Please cite the following paper when using nnUNet:

Isensee, Fabian, et al. "nnU-Net: Breaking the Spell on Successful Medical Image Segmentation." arXiv preprint arXiv:1904.08128 (2019).

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet ############################################### I am running the following nnUNet: 3d_fullres My trainer class is: <class 'nnunet.training.network_training.nnUNetTrainer.nnUNetTrainer'> For that I will be using the following configuration: num_classes: 2 modalities: {0: 'CT'} use_mask_for_norm OrderedDict([(0, False)]) keep_only_largest_region OrderedDict([((2,), False), ((1,), True), ((2, 1), False)]) min_region_size_per_class OrderedDict([(1, 30.55300220489502), (2, 39.52484177819552)]) min_size_per_class OrderedDict([(1, 30.55300220489502), (2, 39.52484177819552)]) normalization_schemes OrderedDict([(0, 'CT')]) stages...

stage: 0 {'batch_size': 2, 'num_pool_per_axis': [4, 5, 5], 'patch_size': array([ 96, 160, 128]), 'median_patient_size_in_voxels': array([148, 231, 231]), 'current_spacing': array([2.77089402, 1.65604239, 1.65604239]), 'original_spacing': array([1.25 , 0.74707043, 0.74707043]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1 {'batch_size': 2, 'num_pool_per_axis': [4, 5, 5], 'patch_size': array([ 96, 160, 128]), 'median_patient_size_in_voxels': array([329, 512, 512]), 'current_spacing': array([1.25 , 0.74707043, 0.74707043]), 'original_spacing': array([1.25 , 0.74707043, 0.74707043]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans I am using batch dice + CE loss

I am using data from this folder: /data3/lidexuan/nnUNet/nnuet/preprocessed_data/ribfrac/nnUNet ############################################### 2020-09-11 10:41:47.192188: unpacking dataset 2020-09-11 10:41:47.335607: done Traceback (most recent call last): File "run/run_training.py", line 99, in trainer.initialize(not validation_only) File "/data3/lidexuan/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 203, in initialize self.initialize_network_optimizer_and_scheduler() File "/data3/lidexuan/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 240, in initialize_network_optimizer_and_scheduler self.network.cuda() File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 458, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 376, in _apply param_applied = fn(param) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 458, in return self._apply(lambda t: t.cuda(device)) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init _check_driver() File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/cuda/init.py", line 77, in _check_driver of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion()))) AssertionError: The NVIDIA driver on your system is too old (found version 10000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. Exception ignored in: <function MultiThreadedAugmenter.del at 0x7fcdff638830> Traceback (most recent call last): File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 287, in del File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 262, in _finish AttributeError: 'NoneType' object has no attribute 'is_alive' Exception ignored in: <function MultiThreadedAugmenter.del at 0x7fcdff638830> Traceback (most recent call last): File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 287, in del File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 262, in _finish AttributeError: 'NoneType' object has no attribute 'is_alive'

FabianIsensee commented 4 years ago

Hi, You can ignore the is_alive errors. That is just the data loader dying. As I said in my previous post, please pip uninstall torch and then reinstall an older version of pytorch that is supported with your driver Best Fabian

On Fri, Sep 11, 2020, 04:48 xbsj_ldx0908 notifications@github.com wrote:

first pip install --upgrade nnunet, and rerun training, got this: (py37) lidexuan@SF-BS-13:/data3/lidexuan/nnUNet/nnunet$ OMP_NUM_THREADS=1 python run/run_training.py 3d_fullres nnUNetTrainer ribfrac 4 --ndet

Please cite the following paper when using nnUNet: Fabian Isensee, Paul F. J盲ger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020). If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet nnUNet_raw_data_base is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read nnunet/paths.md for information on how to set this up properly. nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read nnunet/pathy.md for information on how to set this up. RESULTS_FOLDER is not defined and nnU-Net cannot be used for training or inference. If this is not intended behavior, please read nnunet/paths.md for information on how to set this up ---------? 3d_fullres ribfrac nnUNetTrainer nnUNetPlansv2.1 Traceback (most recent call last): File "run/run_training.py", line 83, in trainer_class = get_default_configuration(network, task, network_trainer, plans_identifier) File "/home/lidexuan/.local/lib/python3.7/site-packages/nnunet/run/default_configuration.py", line 40, in get_default_configuration dataset_directory = join(preprocessing_output_dir, task) File "/root/anaconda3/envs/py37/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType then I downgrade nnunet by run pip install -e .,

then rerun training, then got this: (py37) lidexuan@SF-BS-13:/data3/lidexuan/nnUNet/nnunet$ OMP_NUM_THREADS=1 python run/run_training.py 3d_fullres nnUNetTrainer ribfrac 4 --ndet Please cite the following paper when using nnUNet:

Isensee, Fabian, et al. "nnU-Net: Breaking the Spell on Successful Medical Image Segmentation." arXiv preprint arXiv:1904.08128 (2019).

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet ############################################### I am running the following nnUNet: 3d_fullres My trainer class is: <class 'nnunet.training.network_training.nnUNetTrainer.nnUNetTrainer'> For that I will be using the following configuration: num_classes: 2 modalities: {0: 'CT'} use_mask_for_norm OrderedDict([(0, False)]) keep_only_largest_region OrderedDict([((2,), False), ((1,), True), ((2, 1), False)]) min_region_size_per_class OrderedDict([(1, 30.55300220489502), (2, 39.52484177819552)]) min_size_per_class OrderedDict([(1, 30.55300220489502), (2, 39.52484177819552)]) normalization_schemes OrderedDict([(0, 'CT')]) stages...

stage: 0 {'batch_size': 2, 'num_pool_per_axis': [4, 5, 5], 'patch_size': array([ 96, 160, 128]), 'median_patient_size_in_voxels': array([148, 231, 231]), 'current_spacing': array([2.77089402, 1.65604239, 1.65604239]), 'original_spacing': array([1.25 , 0.74707043, 0.74707043]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1 {'batch_size': 2, 'num_pool_per_axis': [4, 5, 5], 'patch_size': array([ 96, 160, 128]), 'median_patient_size_in_voxels': array([329, 512, 512]), 'current_spacing': array([1.25 , 0.74707043, 0.74707043]), 'original_spacing': array([1.25 , 0.74707043, 0.74707043]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans I am using batch dice + CE loss

I am using data from this folder: /data3/lidexuan/nnUNet/nnuet/preprocessed_data/ribfrac/nnUNet ############################################### 2020-09-11 10:41:47.192188: unpacking dataset 2020-09-11 10:41:47.335607: done Traceback (most recent call last): File "run/run_training.py", line 99, in trainer.initialize(not validation_only) File "/data3/lidexuan/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 203, in initialize self.initialize_network_optimizer_and_scheduler() File "/data3/lidexuan/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 240, in initialize_network_optimizer_and_scheduler self.network.cuda() File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 458, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 376, in _apply param_applied = fn(param) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 458, in return self._apply(lambda t: t.cuda(device)) File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init _check_driver() File "/home/lidexuan/.local/lib/python3.7/site-packages/torch/cuda/init.py", line 77, in _check_driver of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion()))) AssertionError: The NVIDIA driver on your system is too old (found version 10000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. Exception ignored in: <function MultiThreadedAugmenter.del at 0x7fcdff638830> Traceback (most recent call last): File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 287, in del File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 262, in _finish AttributeError: 'NoneType' object has no attribute 'is_alive' Exception ignored in: <function MultiThreadedAugmenter.del at 0x7fcdff638830> Traceback (most recent call last): File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 287, in del File "/home/lidexuan/.local/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 262, in _finish AttributeError: 'NoneType' object has no attribute 'is_alive'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MIC-DKFZ/nnUNet/issues/320#issuecomment-690840846, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHKFHXWAXZPQW3XFDY5IDSFGFYZANCNFSM4RFGASFA .

leetesua commented 4 years ago

Any idea about this? I did downgrade my version of torch but still got this. My torch version is 1.2.0, CUDA = 10.0 Thank you in advance!

/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [3868,0,0], thread: [101,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [3868,0,0], thread: [102,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [3868,0,0], thread: [103,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [3868,0,0], thread: [104,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [3868,0,0], thread: [105,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered Traceback (most recent call last): File "run/run_training.py", line 107, in trainer.run_training() File "/data3/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 275, in run_training super(nnUNetTrainer, self).run_training() File "/data3/nnUNet/nnunet/training/network_training/network_trainer.py", line 352, in run_training l = self.run_iteration(self.tr_gen, True) File "/data3/nnUNet/nnunet/training/network_training/network_trainer.py", line 544, in run_iteration l.backward() File "/root/anaconda3/lib/python3.5/site-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/lib/python3.5/site-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26 Exception ignored in: <bound method MultiThreadedAugmenter.del of <batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter object at 0x7f79084b3a20>> Traceback (most recent call last): File "/root/anaconda3/lib/python3.5/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 287, in del File "/root/anaconda3/lib/python3.5/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 262, in _finish AttributeError: 'NoneType' object has no attribute 'is_alive'

leetesua commented 4 years ago

Sometimes got this : (for different version of torch)

Traceback (most recent call last): File "run/run_training.py", line 107, in trainer.run_training() File "/data3/lidexuan/nnUNet/nnunet/training/network_training/nnUNetTrainer.py", line 275, in run_training super(nnUNetTrainer, self).run_training() File "/data3/lidexuan/nnUNet/nnunet/training/network_training/network_trainer.py", line 352, in run_training l = self.run_iteration(self.tr_gen, True) File "/data3/lidexuan/nnUNet/nnunet/training/network_training/network_trainer.py", line 535, in run_iteration l = self.loss(output, target) File "/root/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/data3/lidexuan/nnUNet/nnunet/training/loss_functions/dice_loss.py", line 122, in forward dc_loss = self.dc(net_output, target) File "/root/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, *kwargs) File "/data3/lidexuan/nnUNet/nnunet/training/loss_functions/dice_loss.py", line 100, in forward tp, fp, fn = get_tp_fp_fn(x, y, axes, loss_mask, self.square) File "/data3/lidexuan/nnUNet/nnunet/training/loss_functions/dice_loss.py", line 55, in get_tp_fp_fn fp = net_output (1 - y_onehot) RuntimeError: CUDA error: device-side assert triggered

FabianIsensee commented 4 years ago

Your comments are marked as resolved. Can I close this issue?

leetesua commented 4 years ago

yeah yeah, problem solved. Thank you !

BelieferQAQ commented 2 years ago

hi,what is the metric that you use nnunet to train ribfrac dataset ?