Closed JunMa11 closed 5 years ago
Following is the full log
Please cite the following paper when using nnUNet:
Isensee, Fabian, et al. "nnU-Net: Breaking the Spell on Successful Medical Image Segmentation." arXiv preprint arXiv:1904.08128 (2019).
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet
###############################################
I am running the following nnUNet: 3d_cascade_fullres
My trainer class is: <class 'nnunet.training.network_training.nnUNetTrainerCascadeFullRes.nnUNetTrainerCascadeFullRes'>
For that I will be using the following configuration:
num_classes: 2
modalities: {0: 'MR'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region OrderedDict([((1, 2), False), ((2,), False), ((1,), False)])
min_region_size_per_class OrderedDict([(1, 0.3988037109375), (2, 78.30931661574891)])
min_size_per_class OrderedDict([(1, 404322.9441427806), (2, 160.876152621963)])
normalization_schemes OrderedDict([(0, 'nonCT')])
stages...
stage: 0
{'batch_size': 2, 'num_pool_per_axis': [3, 5, 5], 'patch_size': array([ 48, 192, 192]), 'median_patient_size_in_voxels': array([ 81, 297, 324]), 'current_spacing': array([2.5 , 1.11122066, 1.11122066]), 'original_spacing': array([2.5 , 0.70310003, 0.70310003]), 'do_dummy_2D_data_aug': True, 'pool_op_kernel_sizes': [[1, 2, 2], [1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[1, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}
stage: 1
{'batch_size': 2, 'num_pool_per_axis': [3, 5, 5], 'patch_size': array([ 32, 224, 224]), 'median_patient_size_in_voxels': array([ 81, 469, 512]), 'current_spacing': array([2.5 , 0.70310003, 0.70310003]), 'original_spacing': array([2.5 , 0.70310003, 0.70310003]), 'do_dummy_2D_data_aug': True, 'pool_op_kernel_sizes': [[1, 2, 2], [1, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[1, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}
I am using stage 1 from these plans
I am using batch dice + CE loss
I am using data from this folder: /home/jma/scratch/data/pre_data/TaskXX_MY_DATASET/nnUNet
###############################################
Traceback (most recent call last):
File "run/run_training.py", line 90, in <module>
batch_dice=batch_dice, stage=stage, unpack_data=unpack, deterministic=deterministic)
File "/home/jma/Code/nnUNet/nnunet/training/network_training/nnUNetTrainerCascadeFullRes.py", line 31, in __init__
"Cannot run final stage of cascade. Run corresponding 3d_lowres first and predict the "
RuntimeError: Cannot run final stage of cascade. Run corresponding 3d_lowres first and predict the segmentations for the next stage
Hi Jun,
if network == '3d_lowres':
trainer.load_best_checkpoint(False)
print("predicting segmentations for the next stage of the cascade")
predict_next_stage(trainer, join(dataset_directory, trainer.plans['data_identifier'] + "_stage%d" % 1))
this is an excerpt of run_training.py. It appears at the very bottom of the script. As you can see, predict_next_stage is called if your network is '3d_lowres'. So the segmentations should have been created. The predictions of the validation set are however not what is used for the next stage of the cascade. There should be another folder "segs_from_prev_stage" (or similar) in the folder where the "fold_X" subfolders are. I don't know why it is missing. Could you please run
python run/run_training.py 3d_lowres nnUNetTrainer TaskXX_MY_DATASET 0 --ndet -val
and tell me what the output is? Also please check if the missing folder will be created. Best, Fabian
Hi Fabian, Thanks for your quick reply.
I run 3 different folds, but all of them suffer from the following error
train_704
separate z: True lowres axis [0]
separate z
train_704 (2, 84, 315, 315)
debug: mirroring True mirror_axes (0, 1, 2)
train_828
separate z: False lowres axis None
train_828 (2, 83, 324, 324)
debug: mirroring True mirror_axes (0, 1, 2)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/PytorchCode/nnUNet0501/nnunet/inference/segmentation_export.py", line 100, in save_segmentation_nifti_from_softmax
bbox[c][1] = np.min((bbox[c][0] + seg_old_spacing.shape[c], shape_original_before_cropping[c]))
TypeError: 'int' object is not subscriptable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run/run_training.py", line 111, in <module>
separate z: False lowres axis None
trainer.validate(save_softmax=args.npz, validation_folder_name=val_folder)
File "/jaylabs/amartel_data2/liver_MRI/GadSurgical/LiverTumorSeg/PytorchCode/nnUNet0501/nnunet/training/network_training/nnUNetTrainer.py", line 497, in validate
_ = [i.get() for i in results]
File "/PytorchCode/nnUNet0501/nnunet/training/network_training/nnUNetTrainer.py", line 497, in <listcomp>
_ = [i.get() for i in results]
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: 'int' object is not subscriptable
Seems like your .pkl that accompanies the preprocessed data is corrupted. I am not in the office right now. It would help me a lot if you could send me the file. If should be located in the folder where your preprocessed data is (train_828.pkl or something similar). (f.isensee at dkfz.de) You can also try to run the preprocessing again. Best, Fabian
Hi @FabianIsensee ,
Thanks for your help.
I sent you the email with train_828.pkl
.
Meanwhile, I re-run the plan_and_preprocess_task.py
, then
python run/run_training.py 3d_lowres nnUNetTrainer TaskXX_MY_DATASET 0 -val --ndet
but the same error occurred again.
train_649.pkl
can be downloaded here.
train_629
separate z: False lowres axis None
train_629 (2, 88, 360, 360)
debug: mirroring True mirror_axes (0, 1, 2)
train_649
separate z: True lowres axis [0]
separate z
train_649 (2, 103, 396, 396)
debug: mirroring True mirror_axes (0, 1, 2)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/nnunet/inference/segmentation_export.py", line 100, in save_segmentation_nifti_from_softmax
bbox[c][1] = np.min((bbox[c][0] + seg_old_spacing.shape[c], shape_original_before_cropping[c]))
TypeError: 'int' object is not subscriptable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run/run_training.py", line 111, in <module>
trainer.validate(save_softmax=args.npz, validation_folder_name=val_folder)
File "/nnunet/training/network_training/nnUNetTrainer.py", line 497, in validate
_ = [i.get() for i in results]
File "/nnunet/training/network_training/nnUNetTrainer.py", line 497, in <listcomp>
_ = [i.get() for i in results]
File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: 'int' object is not subscriptable
I also check the original nii.gz
files in /imagesTr
and /labelsTr
with the following code:
path = ''
names = os.listdir(path)
names.sort()
for name in names:
data = nb.load(os.path.join(path, name)).get_data()
print(name, data.shape)
All the files are ok.
Hi there, I apologize, I may have not chosen the right word. By corrupted I meant that the pkl file seems wrong and I could now confirm that based on the files you sent me:
from batchgenerators.utilities.file_and_folder_operations import *
a = load_pickle('train_643.pkl')
print(a['crop_bbox'])
(1, 59, 512, 512)
The output is supposed to look different. I deleted my files from the Liver task and reran the cropping and this is what it looks like:
from batchgenerators.utilities.file_and_folder_operations import *
a = load_pickle('liver_22.pkl')
print(a['crop_bbox'])
[[0, 247], [0, 512], [0, 512]]
My previous comment about rerunning the preprocessing was incorrect. You should try to rerun the data cropping. Delete the folder that belongs to your task in nnUNet_raw_cropped and rerun preprocessing. You can then check with the code snipped above whether it worked. If it still does now work, have a look at your Task03_Liver data. Do they have the same error? If not - what is different in your data? Is there maybe a 2D Dataset in there? Or is train_643 4D (which it should not be!!)? (You should check for 3D/4D in the nnUNet_raw_splitted folder as this is where the cropping takes the data from). Best, Fabian
Just as a side note: On the Task03_Liver the cascade did not do so well. Not quite sure why. What I show here is average foreground dice (so the mean dice of liver and tumor) from the 5 fold cross-validation. You may not have to run the cascade to get the best results
2d 0.7345 3d_cascade_fullres 0.74 3d_fullres 0.7686 3d_lowres 0.7314
Hi @FabianIsensee ,
Thanks for your help.
I rerun the preprocessing, and it works Now. A new folder named pred_next_stage
is generated.
I have no idea why the previous 'train_643.pkl' has wrong 'crop_bbox'
. Anyway, it works well now.
For the lower performance of cascade
in LiTS task, in your Decathlon challenge paper,
nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation
What is the motivation of using the outputs of lowres-UNet as additional input channels for the second U-Net?
Intuitively, we can generate ROI image based on the 1st U-Net segmentation (ignoring the region outside the segmentation bounding box), then the 2nd U-Net directly segments the ROI image.
For LiTS, we can even only segment the tumor in liver mask. In this way, it can not only reduce computation burden, but also exclude interruptions outside the liver.
I guess nnUNet
is designed for general segmentation tasks, so it doesn't do this for LiTS.
Best, Jun
A quick question on learning rate setting in transfer learning scenario. My task is also liver tumor segmentation (MR), but I only have a small dataset. So I want to use the well trained model in LiTS dataset and finetune in my small dataset.
Do I need to reduce the initial_lr in nnUNetTrainer.py
(eg. reduce to 3e-5)?
Or nnU-Net will automatically adjust lr to accord with current training.
What is the motivation of using the outputs of lowres-UNet as additional input channels for the second U-Net?
The motivation is that the patch size for 3d_fullres may be too small to capture sufficient contextual information for the UNet to properly segment the target straucture. By usind 3d_lowres we guarantee that enough contextual information is captured, at the cost of rediced spatial resolution. The second stage of the cascade is intended to refine these segmentations.
Intuitively, we can generate ROI image based on the 1st U-Net segmentation (ignoring the region outside the segmentation bounding box), then the 2nd U-Net directly segments the ROI image.
This is a very sensible thing to do and in fact something we could/should have done. Indeed I am thinking about implementing a mix of the two. Doing solely what you suggested may not be ideal if the target structures are distributed all across the images (and not just a specific target organ fro example)
For LiTS, we can even only segment the tumor in liver mask. In this way, it can not only reduce computation burden, but also exclude interruptions outside the liver.
If there is a hierarchy to the labels then this is definitely worth doing, see my BraTS2018 paper. But as you said, nnunet is intended to be general purpose and we don't know about label hierarchies
Do I need to reduce the initial_lr in nnUNetTrainer.py (eg. reduce to 3e-5)? Or nnU-Net will automatically adjust lr to accord with current training.
I have no experience with fine tuning, you need to figure that our yourself, sorry. nnU-net will however deacrease the learning rate automatically if it does not detect an improvement within recent epochs. So the training may be shorter. But really fine tuning involves a lot more than just the learning rate I think. Some people like warm starts, some decrease the learning rate. There is also a variety of learning rate schedules for that. Honestly, I don't know
Hope this helps, Best, Fabian
Hi Fabian,
Got it. Thanks for your answer very much.
Best, Jun
Hi Jun,
You can look into Models Genesis developed by our lab. We provide pre-trained weights for nnUNet framework. (Transfer Learning) (https://github.com/MrGiovanni/ModelsGenesis/tree/master/competition)
Best, Shivam
Dear Fabian,
Thanks for the great respo.
I want to use
3D U-Net Cascade
. Firstly, I runand
nnUNet
generates predictions of validation dataset in each folder.Then, I run
but I get following error
I'm confused about the error, because the segmentations have been generated automatically during the
3d_lowres
step (invalidation
folder).Could you give some insights on this error?
Looking forward to your reply. Best, Jun