MIC-DKFZ / nnUNet

Apache License 2.0
5.79k stars 1.74k forks source link

Prediction with the AutoPET pretrained networks fails #2126

Closed komiloserdov closed 3 months ago

komiloserdov commented 6 months ago

Hi!

I was trying to follow the instructions here to run inference with the pretrained weights for the AutoPET challenge. So far everything seemed to be working however while trying to initialize the resencoder I get a couple of warnings:

UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans. warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture " FOUND IT: <class 'dynamic_network_architectures.architectures.unet.ResidualEncoderUNet'>

and unfortunately an error:

TypeError: ResidualEncoderUNet.init() got an unexpected keyword argument 'n_conv_per_stage'

Any idea how to fix this?

The code I am trying to run is the following:

predictor = nnUNetPredictor(
        tile_step_size=0.5,
        use_gaussian=True,
        use_mirroring=True,
        perform_everything_on_device=True,
        device=torch.device('cuda', 0),
        verbose=False,
        verbose_preprocessing=False,
        allow_tqdm=True
    )
predictor.initialize_from_trained_model_folder(
        '/path/to/nnUnet_data_dir/nnUNet_results/Dataset221_AutoPETII_2023/nnUNetTrainer__nnUNetPlans__3d_fullres_resenc_bs80',
        use_folds=(0,1,2,3,4),
        checkpoint_name='checkpoint_final.pth',
    )

In Case the full traceback helps:

File /path/to/.venv/lib64/python3.10/site-packages/nnunetv2/inference/predict_from_raw_data.py:100, in nnUNetPredictor.initialize_from_trained_model_folder(self, model_training_output_dir, use_folds, checkpoint_name) 96 num_input_channels = determine_num_input_channels(plans_manager, configuration_manager, dataset_json) 97 trainer_class = recursive_find_python_class(join(nnunetv2.path[0], "training", "nnUNetTrainer"), 98 trainer_name, 'nnunetv2.training.nnUNetTrainer') --> 100 network = trainer_class.build_network_architecture( 101 configuration_manager.network_arch_class_name, 102 configuration_manager.network_arch_init_kwargs, 103 configuration_manager.network_arch_init_kwargs_req_import, 104 num_input_channels, ... 38 ) 40 if hasattr(network, 'initialize') and allow_init: 41 network.apply(network.initialize)

ykirchhoff commented 6 months ago

Hi @komiloserdov,

there was recently a change in the plans nnUNet produces and the backward compatibility doesn't work currently for the ResidualEncoderUNet. We will fix that asap and I will let you know then.

Best, Yannick

komiloserdov commented 5 months ago

Hi,

Is there by chance any news on this? It's probably not the highest priority but do you have an estimate on when the downward compatibility problem will be fixed?

ykirchhoff commented 5 months ago

Hi @komiloserdov,

thanks for the reminder and sorry for the late reply! I am currently quite busy with paper submissions and rebuttals and didn't really find the time to work on the issues. Here is a quick fix for you until it is actually fixed in nnUNet: The problem is with the handling of old plans files which used the ResidualEncoderUNet instead of the default PlainConvUNet. In order to solve this you need to fix the arch_dict definition here. You can do that by simply replacing it by this:

if unet_class_name == "PlainConvUNet":
    arch_dict = {
        'network_class_name': network_class_name,
        'arch_kwargs': {
            "n_stages": n_stages,
            "features_per_stage": [min(self.configuration["UNet_base_num_features"] * 2 ** i,
                                    self.configuration["unet_max_num_features"])
                                for i in range(n_stages)],
            "conv_op": conv_op.__module__ + '.' + conv_op.__name__,
            "kernel_sizes": deepcopy(self.configuration["conv_kernel_sizes"]),
            "strides": deepcopy(self.configuration["pool_op_kernel_sizes"]),
            "n_conv_per_stage": deepcopy(self.configuration["n_conv_per_stage_encoder"]),
            "n_conv_per_stage_decoder": deepcopy(self.configuration["n_conv_per_stage_decoder"]),
            "conv_bias": True,
            "norm_op": instnorm.__module__ + '.' + instnorm.__name__,
            "norm_op_kwargs": {
                "eps": 1e-05,
                "affine": True
            },
            "dropout_op": None,
            "dropout_op_kwargs": None,
            "nonlin": "torch.nn.LeakyReLU",
            "nonlin_kwargs": {
                "inplace": True
            }
        },
        # these need to be imported with locate in order to use them:
        # `conv_op = pydoc.locate(architecture_kwargs['conv_op'])`
        "_kw_requires_import": [
            "conv_op",
            "norm_op",
            "dropout_op",
            "nonlin"
        ]
    }
elif unet_class_name == 'ResidualEncoderUNet':
    arch_dict = {
        'network_class_name': network_class_name,
        'arch_kwargs': {
            "n_stages": n_stages,
            "features_per_stage": [min(self.configuration["UNet_base_num_features"] * 2 ** i,
                                    self.configuration["unet_max_num_features"])
                                for i in range(n_stages)],
            "conv_op": conv_op.__module__ + '.' + conv_op.__name__,
            "kernel_sizes": deepcopy(self.configuration["conv_kernel_sizes"]),
            "strides": deepcopy(self.configuration["pool_op_kernel_sizes"]),
            "n_blocks_per_stage": deepcopy(self.configuration["n_conv_per_stage_encoder"]),
            "n_conv_per_stage_decoder": deepcopy(self.configuration["n_conv_per_stage_decoder"]),
            "conv_bias": True,
            "norm_op": instnorm.__module__ + '.' + instnorm.__name__,
            "norm_op_kwargs": {
                "eps": 1e-05,
                "affine": True
            },
            "dropout_op": None,
            "dropout_op_kwargs": None,
            "nonlin": "torch.nn.LeakyReLU",
            "nonlin_kwargs": {
                "inplace": True
            }
        },
        # these need to be imported with locate in order to use them:
        # `conv_op = pydoc.locate(architecture_kwargs['conv_op'])`
        "_kw_requires_import": [
            "conv_op",
            "norm_op",
            "dropout_op",
            "nonlin"
        ]
    }

Hope this solves the issue for now and we will still officially fix this soon.

Best, Yannick

komiloserdov commented 5 months ago

Hi Yannik,

no worries and thanks for the quick fix, it does fix the issue!

Best, Konstantin

ykirchhoff commented 5 months ago

Hi Konstantin,

great to hear that it worked! I made an internal merge request, so it should also be fixed soonish.

Best, Yannick

komiloserdov commented 5 months ago

Hi Yannick,

unfortunately this only works with thee 3d_fullres_resenc_bs80 plan. It seems that there is a problem with the 3d_fullres_resenc_192x192x192_b24 one:

For the command nnUNetv2_predict -i /path/to/imagesFor222/ -o /outdir/ -d 221 -c 3d_fullres_resenc_192x192x192_b24 -f 0 1 2 3 4 --save_probabilities

I get the following error logs:

Predicting 00CE74739D2D4DF878AC538D03646913-20150709: perform_everything_on_device: True Traceback (most recent call last): File "/opt/conda/bin/nnUNetv2_predict", line 8, in sys.exit(predict_entry_point()) File "/opt/conda/lib/python3.10/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 866, in predict_entry_point predictor.predict_from_files(args.i, args.o, save_probabilities=args.save_probabilities, File "/opt/conda/lib/python3.10/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 258, in predict_from_files return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export) File "/opt/conda/lib/python3.10/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 375, in predict_from_data_iterator prediction = self.predict_logits_from_preprocessed_data(data).cpu() File "/opt/conda/lib/python3.10/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 484, in predict_logits_from_preprocessed_data self.network.load_state_dict(params) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for ResidualEncoderUNet: Missing key(s) in state_dict: "encoder.stages.0.blocks.1.conv1.conv.weight", "encoder.stages.0.blocks.1.conv1.conv.bias", "encoder.stages.0.blocks.1.conv1.norm.weight", "encoder.stages.0.blocks.1.conv1.norm.bias", "encoder.stages.0.blocks.1.conv1.all_modules.0.weight", "encoder.stages.0.blocks.1.conv1.all_mo> Unexpected key(s) in state_dict: "encoder.stages.1.blocks.2.conv1.conv.weight", "encoder.stages.1.blocks.2.conv1.conv.bias", "encoder.stages.1.blocks.2.conv1.norm.weight", "encoder.stages.1.blocks.2.conv1.norm.bias", "encoder.stages.1.blocks.2.conv1.all_modules.0.weight", "encoder.stages.1.blocks.2.conv1.all>

So the model instantiation seems to work however there seems to be a mismatch between the pretrained architecture and the loaded architecture.

Best, Konstantin

ykirchhoff commented 5 months ago

Hi Konstantin,

that is strange, I would have assumed that the underlying architecture definition should be the same, but it seems to differ in the encoder. I will need to check what the difference actually is and how to handle it. Will get back to you afterwards!

Best, Yannick

ykirchhoff commented 4 months ago

Hi Konstantin,

sorry, that it took a bit longer. I tried it on my machine with the newest version of nnUNet and for me it works fine. Could you update your version and check again?

Best, Yannick

ykirchhoff commented 4 months ago

Hi Konstantin,

did you find the time yet to check if it works for you?

Best, Yannick

komiloserdov commented 3 months ago

Hi Yannik,

I'm really sorry for the delayed response. I was a little busy otherwise and this issue disappeared in the back of my head!

I tried it with a clean install & setup and it worked with both pretrained models.

Again, thanks a lot for the quick fix!

All the best, Konstantin

ykirchhoff commented 3 months ago

Hi Konstantin,

no problem at all, great to hear it works fine now!

Best, Yannick