DIAGNijmegen / AbdomenMRUS-prostate-segmentation

Grand Challenge wrapper for whole-gland prostate segmentation with nnUNet
Apache License 2.0
14 stars 5 forks source link

nnUNet_plan_and_preprocess #2

Closed jakubMitura14 closed 1 year ago

jakubMitura14 commented 2 years ago

First thank you for publishing your work !!

I am trying to execute your algorithm on a private dataset. I had mha files and preprocessed them to nnU-Net Raw Data Archive using settings

mha2nnunet_settings = {
    "dataset_json": {
        "task": "Task2202_prostate_segmentation",
        "description": "bpMRI scans from PI-CAI dataset to train nnUNet baseline",
        "tensorImageSize": "4D",
        "reference": "",
        "licence": "",
        "release": "1.0",
        "modality": {
            "0": "T2W",
            "1": "CT",
            "2": "HBV"
        },
        "labels": {
            "0": "background",
            "1": "lesion"
        }
    },
    "preprocessing": {
        "spacing": [
            3.0,
            0.5,
            0.5
        ],
    },
    "archive": arr
}

where arr is a list with supplied metadata and file list with relative paths to input folder

{"patient_id": tupl["patient_id"],
                "study_id":  tupl["study_id"],
                "scan_paths": newFileList, 
                "annotation_path": tupl["patient_id"]+'.nii.gz'
                }

after preprocessing I have files like

image

Then I try to run docker by

docker run --gpus='"device=0"' \
    -v /workspaces/baselineUnet/newOutput/Task2202_prostate_segmentation:/workdir/nnUNet_raw_data \
    -v /workspaces/baselineUnet/results:/workdir/results \
    joeranbosma/picai_nnunet:latest nnunet plan_train \
    Task2202_prostate_segmentation /workdir/ \
    --trainer nnUNetTrainerV2_Loss_FL_and_CE_checkpoints --fold 0

I get error

Traceback (most recent call last):
  File "/opt/conda/bin/nnUNet_plan_and_preprocess", line 33, in <module>
    sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_plan_and_preprocess')())
  File "/home/user/nnunet/nnunet/experiment_planning/nnUNet_plan_and_preprocess.py", line 103, in main
    task_name = convert_id_to_task_name(i)
  File "/home/user/nnunet/nnunet/utilities/task_name_id_conversion.py", line 51, in convert_id_to_task_name
    raise RuntimeError("Could not find a task with the ID %d. Make sure the requested task ID exists and that "
RuntimeError: Could not find a task with the ID 2202. Make sure the requested task ID exists and that nnU-Net knows where raw and preprocessed data are located (see Documentation - Installation). Here are your currently defined folders:
nnUNet_preprocessed=/home/user/data
RESULTS_FOLDER=/workdir/results
nnUNet_raw_data_base=/workdir

[#] Creating plans and preprocessing data
Traceback (most recent call last):
  File "/usr/local/bin/nnunet", line 568, in <module>
    action(argv)
  File "/usr/local/bin/nnunet", line 250, in plan_train
    subprocess.check_call(cmd)
  File "/opt/conda/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['nnUNet_plan_and_preprocess', '-t', '2202', '-tl', '1', '-tf', '1', '--verify_dataset_integrity', '--planner2d', 'None']' returned non-zero exit status 1.
...

Traceback (most recent call last):
  File "/usr/local/bin/nnunet", line 568, in <module>
    action(argv)
  File "/usr/local/bin/nnunet", line 250, in plan_train
    subprocess.check_call(cmd)
  File "/opt/conda/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['nnUNet_plan_and_preprocess', '-t', '2202', '-tl', '1', '-tf', '1', '--verify_dataset_integrity', '--planner2d', 'None']' returned non-zero exit status 1.

Files as far as I manually inspected portion using slicer are not corrupted.

part of preprocessing log information

CASE 017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099
    PATIENT ID  017
    STUDY ID    1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099

Importing 3 scanss
    + (1) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_t2w.mha
    + (2) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_adc.mha
    + (3) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_hbv.mha
Importing annotation
    + /workspaces/baselineUnet/labels/017.nii.gz
Writing 3 scans including annotation
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0000.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0001.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0002.nii.gz
Wrote annotation to /workspaces/baselineUnet/newOutput/taskA/labelsTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099.nii.gz
Importing 3 scanss
    + (4) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_t2w.mha
    + (5) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_adc.mha
    + (6) /workspaces/baselineUnet/mhaArchive/017/1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_hbv.mha
Importing annotation
    + /workspaces/baselineUnet/labels/017.nii.gz
Writing 6 scans including annotation
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0000.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0001.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0002.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0003.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0004.nii.gz
Wrote image to /workspaces/baselineUnet/newOutput/taskA/imagesTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099_0005.nii.gz
Wrote annotation to /workspaces/baselineUnet/newOutput/taskA/labelsTr/017_1.3.12.2.1107.5.8.15.100960.30000022021714463775000001099.nii.gz
========================================================================================================================
CASE 234_1.3.12.2.1107.5.2.41.69644.30000020011405272273800000001
    PATIENT ID  234
    STUDY ID    1.3.12.2.1107.5.2.41.69644.30000020011405272273800000001
joeranbosma commented 2 years ago

Hi @jakubMitura14,

Thanks for including the commands and logs, that is very helpful! To ensure I got this correct, what exactly is your goal? Are trying to train a new model from scratch, using the same nnU-Net setup as us, using your institutional training dataset? Or are you trying to perform inference using the trained model, to get prostate segmentations for your institutional dataset?

Given the logs, it seems you have annotations, so I am assuming the first.

Preprocessing Your preprocessing appears to have been completed successfully, at least the intended directories are correct. Could you confirm that you now have a dataset.json file? This file should be written to the folder Task2202_prostate_segmentation, which also contains the folders imagesTr and labelsTr.

If any case encountered an error, the dataset.json file will not be written, and you should check out the logfile to see what error you encountered.

Training When training with the nnU-Net framework, a specific folder structure is expected. We (in the joeranbosma/picai_nnunet:latest Docker container) mainly follow the default folder structure described in the nnU-Net documentation and dataset_conversion.md specifically. Additionally, we expect the workdir to be the parent folder of both the nnUNet_raw_data and results folder. This does not seem to be the case for you, and I think that's what went wrong.

The folder structure should (after preprocessing) be:

/workdir
├── nnUNet_raw_data
    ├── Task2201_picai_baseline
        ├── imagesTr
        ├── labelsTr
├── results

While you do have the Task2201_picai_baseline folder with subdirectories, it should be located under /workdir with the folder nnUNet_raw_data in between.

Hope this helps, let me know if you run into any other issues.

Kind regards, Joeran

jakubMitura14 commented 2 years ago

Hello Thank you so much for detailed response, yes I intend for now to train using my dataset for this moment at least. I do not have dataset.json, and after changing directory as You had suggested the same error occur here is the full picai prep log https://drive.google.com/file/d/10eV1iCsMJ2ACseKe7myI0SyRKuhWlCf-/view?usp=sharing

image

docker run --gpus='"device=0"' \
    -v /workspaces/baselineUnet/newOutput/workdir/nnUNet_raw_data:/workdir/nnUNet_raw_data \
    -v /workspaces/baselineUnet/newOutput/workdir/results:/workdir/results \
    joeranbosma/picai_nnunet:latest nnunet plan_train \
    Task2202_prostate_segmentation /workdir/ \
    --trainer nnUNetTrainerV2_Loss_FL_and_CE_checkpoints --fold 0
joeranbosma commented 2 years ago

Hi Jakub,

Without a dataset.json you cannot train the nnU-Net model, so we need to tackle that issue first. Could you try upgrading to the most recent release of picai_prep (pip install picai_prep==2.0.1)? We improved logging there, so hopefully that helps debugging the issue here.

With the updated picai_prep, could you run the preprocessing command again? You can keep the currently converted cases, to skip those during conversion, and save time.

Also, I would recommend mounting the workdir itself, because then the cropped and preprocessed scans will be written to your workspace, and you won't have to convert those every time you train. So then your command becomes:

docker run --gpus='"device=0"' \
    -v /workspaces/baselineUnet/newOutput/workdir:/workdir/ \
    joeranbosma/picai_nnunet:latest nnunet plan_train \
    Task2202_prostate_segmentation /workdir/ \
    --trainer nnUNetTrainerV2_Loss_FL_and_CE_checkpoints --fold 0

This will result in both /workdir/nnUNet_raw_data and /workdir/results being mounted in the correct spot.

jakubMitura14 commented 2 years ago

Thank You for sharing your expertise ! Below is a link to the logfile that was created with picai_prep==2.0.1 https://drive.google.com/file/d/1c6Lx5YTdVWYi3UPkqJO6rmv4x-igElgG/view?usp=sharing

some detail that I do not know whether is important I am doing it all as a docker in docker setup - However docker and Nvidia docker was tested and worked inside

joeranbosma commented 2 years ago

Hi Jakub,

Your logs are looking completely fine, and the conversion ends gracefully as well. I think the issue lies within our conversion process. I'll investigate this and update you once I know more. Thanks for raising the issue!

jakubMitura14 commented 2 years ago

Thank you !!

joeranbosma commented 2 years ago

Hi Jakub,

Apparently, the dataset.json is not generated automatically when converting the MHA Archive to nnU-Net Raw Data Format using the command line. This was not intentional, so I have made a PR for picai_prep to update this behaviour. This update should be merged in soon (probably tomorrow).

In the meantime, you can use a script to generate this dataset.json properly using the Python interface. Within this script, you need to call the archive.create_dataset_json() function. You can take a look at the preprocessing script that was used here, which I have now updated to use picai_prep >= 2.0.

In your case when using a Docker container, you would need to provide this script:

docker run --rm \
    -v /workspaces/baselineUnet/newOutput/workdir:/workdir/ \
    -v /path/to/AbdomenMRUS-prostate-segmentation/training:/scripts/ \
    joeranbosma/picai_nnunet:latest python /scripts/prepare_data.py

After this, you should have the dataset.json inside the /workdir/nnUNet_raw_data/Task2201_picai_baseline folder.

I have checked in my setup, and after these steps the network should start training. These were the logs I got:

(seg_tmp) pidgey@pidgey:~/joeran/repos$ docker run --rm \
>     -v /media/pelvis/projects/joeran/picai/workdir:/workdir/ \
>     joeranbosma/picai_nnunet:latest nnunet plan_train \
>     Task2202_prostate_segmentation /workdir \
>     --trainer nnUNetTrainerV2_Loss_FL_and_CE_checkpoints --fold 0

=============
== PyTorch ==
=============

NVIDIA Release 20.12 (build 17950526)
PyTorch Version 1.8.0a0+1606899

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Copyright (c) 2014-2020 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --ipc=host ...

Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

Verifying training set
checking case 020_020
checking case 021_021
checking case 022_022
checking case 023_023
checking case 024_024

(you can ignore the GPU-related errors, this is because I tested it without attaching a GPU)

Hope this helps, Joeran

jakubMitura14 commented 2 years ago

Fantastic! Currently I am in the train going on a conference, Hovewer as soon as I will he able to I will try I our and report back. Thank you for your time!

jakubMitura14 commented 2 years ago

Regretfully still I can not make it work, although dataset json is generating below preprocess log, dataset json directory structure docker definition and error image dataset json https://drive.google.com/file/d/1unWemuCz7GGv0D0Iqs5tOrA7zkpWIo_L/view?usp=sharing log https://drive.google.com/file/d/1fpvi2LouMgwr11xNJSScLe7GYdZ4XB7b/view?usp=sharing

call docker like

docker run --gpus='"device=0"' \
    -v /workspaces/baselineUnet/newOutput/workdir:/workdir/ \
    joeranbosma/picai_nnunet:latest nnunet plan_train \
    Task2202_prostate_segmentation /workdir/ \
    --trainer nnUNetTrainerV2_Loss_FL_and_CE_checkpoints --fold 0

preprocessing part in python script

archiveNext = MHA2nnUNetConverter(
    scans_dir="/workspaces/baselineUnet/mhaArchive",
    annotations_dir="/workspaces/baselineUnet/labels",  # defaults to input_path
    output_dir="/workspaces/baselineUnet/newOutput/workdir/nnUNet_raw_data",
    mha2nnunet_settings=outJsonDir,
)
archiveNext.convert()
archiveNext.create_dataset_json()

with json specification of mha2nnunet settings https://drive.google.com/file/d/1Mr3vMrukoE0eIt3bY_sIxuFfcCfAkNYM/view?usp=sharing

error

=============
== PyTorch ==
=============

NVIDIA Release 20.12 (build 17950526)
PyTorch Version 1.8.0a0+1606899

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Copyright (c) 2014-2020 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --ipc=host ...

Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

Traceback (most recent call last):
  File "/opt/conda/bin/nnUNet_plan_and_preprocess", line 33, in <module>
    sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_plan_and_preprocess')())
  File "/home/user/nnunet/nnunet/experiment_planning/nnUNet_plan_and_preprocess.py", line 103, in main
    task_name = convert_id_to_task_name(i)
  File "/home/user/nnunet/nnunet/utilities/task_name_id_conversion.py", line 51, in convert_id_to_task_name
    raise RuntimeError("Could not find a task with the ID %d. Make sure the requested task ID exists and that "
RuntimeError: Could not find a task with the ID 2202. Make sure the requested task ID exists and that nnU-Net knows where raw and preprocessed data are located (see Documentation - Installation). Here are your currently defined folders:
nnUNet_preprocessed=/home/user/data
RESULTS_FOLDER=/workdir/results
nnUNet_raw_data_base=/workdir
If something is not right, adapt your environemnt variables.
If you have questions or suggestions, feel free to open an issue at https://github.com/DIAGNijmegen/picai_prep

Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

[#] Creating plans and preprocessing data
Traceback (most recent call last):
  File "/usr/local/bin/nnunet", line 568, in <module>
    action(argv)
  File "/usr/local/bin/nnunet", line 250, in plan_train
    subprocess.check_call(cmd)
  File "/opt/conda/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['nnUNet_plan_and_preprocess', '-t', '2202', '-tl', '1', '-tf', '1', '--verify_dataset_integrity', '--planner2d', 'None']' returned non-zero exit status 1.
joeranbosma commented 2 years ago

Hi Jakub,

Something odd is going on here... Everything is looking perfectly fine. I have tried to recreate the issue by running these steps myself, but I cannot recreate the issue -- for me, training is running fine.

I think your best way forward is to open an issue in the nnU-Net framework repository. The issue occurs when performing the nnUNet_plan_and_preprocess step in their framework, so maybe they know better where to look for the root cause. If you learn what the issue was, I would really appreciate to hear back from you!

From my side there are two final things you can try, but I don't expect them to solve the task-not-found issue.

  1. The Docker call did not properly specify the compute resources necessary, which I have now updated in the training documentation. The training Docker run command should start with: docker run --cpus=8 --memory=32gb --shm-size=32gb --gpus='"device=0"' -it --rm -v .... (or different numbers, depending on your compute setup).
  2. You can read through the nnU-Net training steps of the PI-CAI baseline, which is essentially the same as you are doing here, but with different data. This tutorial can be found here: https://github.com/DIAGNijmegen/picai_baseline.

Hope this may be of some help, Kind regards, Joeran

jakubMitura14 commented 2 years ago

Thanks! I will analyze it If I will be able to find out the problem I will post it here