MHubAI / models

Stores the MHub models dockerfiles and scripts.
MIT License
8 stars 16 forks source link

MHub / IDC - Implementing the nnUNet Task005 Prostate MR (ADC, T2) Model #67

Closed LennyN95 closed 7 months ago

LennyN95 commented 11 months ago

I updated my implementation of the nnUNet Prostate MR (Task005) model.

The model takes two input series:

The current default config uses a FileStructureImporter that accepts the following input structure (note that study is optional):

├── Patient1/
│   ├── study1
│   │   └── ADC 
│   │   │   └── dicom 
│   │   │       └── *.dcm
│   │   └── T2
│   │   │   └── dicom 
│   │   │       └── *.dcm
│   └── study2 
├── Patient2/
│   └── ADC 
│   │   └── dicom 
│   │       └── *.dcm
│   └── T2 
│   │   └── dicom 
│   │       └── *.dcm

To use the patient/strudy structure only, change to:

FileStructureImporter
    structures:
    - $patientID/$studyID@instance/$part@/dicom@dicom

To use the patient/ structure only use:

FileStructureImporter
    import_id: patientID
    structures:
    - $patientID@instance/ADC$part@/dicom@dicom
    - $patientID/T2$part@/dicom@dicom

To omit the dicom folders and instead organize all dicom files directly under the ADC and T2 folders, the following import structure can be used:

FileStructureImporter:
  outsource_instances: True 
  import_id: patientID/studyID
  structures:
    - $patientID/$studyID@instance/$part@dicom
    - $patientID@instance:studyID=none/ADC$part@dicom
    - $patientID@instance:studyID=none/T2$part@dicom
ccosmin97 commented 11 months ago

@LennyN95 here is the meta.json file : nnunet_task05_meta.json

Regarding the segdb codes, I did not see anything for the prostate zonal regions -- output labels for this models. You could use these codes : PROSTATE_PERIPHERAL_ZONE | Peripheral zone of prostate | C_BODY_STRUCTURE | T_PROSTATIC_PZ_STRUCTURE |   |
where T_PROSTATIC_PZ_STRUCTURE = 279706003

PROSTATE_TRANSITION_ZONE | Transition zone of prostate | C_BODY_STRUCTURE | T_PROSTATIC_TZ_STRUCTURE |   | where T_PROSTATIC_TZ_STRUCTURE = 399384005

Here is my dcmqi json file for converting these SEGS to DICOM for reference : task05_nnunet_dcmqi.json

For licensing purposes, heres the nnUNet task05 dataset.json, being a bit more explicit about their licensing scheme. dataset_task05.json

LennyN95 commented 10 months ago

Thx @ccosmin97,

Following your suggestion, I added the two new substructures to our SegDB with commit 9f922e1f.

Regarding the dicom conversion, which MR scan did you use as reference, I guess T2?

ccosmin97 commented 10 months ago

Regarding the dicom conversion, which MR scan did you use as reference, I guess T2?

I am a bit confused here, do you mean the resampling scheme to get identical geometry between the T2 and ADC volumes? In that case, I used the T2 as a fixed grid to resample the ADC volume.

If you are referring to the dicomseg conversion of potential manual annotations, in my use case (ProstateX, QIN-Prostate-Repeatability collections) they always refer to the T2 volume.

ccosmin97 commented 10 months ago

I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal. I did do resampling of ADC to T2 during my analysis because registration of ADC and T2 volumes was out of scope for the collections analyzed.

I did put some notes regarding this in the meta.json

LennyN95 commented 10 months ago

I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal

Yes @ccosmin97, I therefore build a Resampler Module that will resample ADC to T2's spacing. Is that what you mean?

ccosmin97 commented 10 months ago

I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal

Yes @ccosmin97, I therefore build a Resampler Module that will resample ADC to T2's spacing. Is that what you mean?

@LennyN95 That's what I did for my analysis, perfect. For these multi-modal models that assume co-registered inputs, maybe it is a good idea to add a more generic resampling scheme in the future, where the user can choose the fixed resampling grid. In this case, it is the T2 image, but it also could be the ADC one or a fixed arbitrary grid. Motivation is these choices are purely experimental, trying to conform to same-size inputs. I do not think there is a perfect way to design this resampling scheme, it is case-specific. In my case, resampling to the higher resolution, i.e. T2, makes more sense, also because the T2 image has better prostate tissue contrast compared to ADC, so we do not want to degrade/resample the T2 image. I believe in your Resample module, it all boils down to the 'fixed' plastimatch argument, where here you specify 'path_to_t2.nii.gz' but it could also be a fixed grid-like '250,250,30' -- x,y,z voxel size

ccosmin97 commented 10 months ago

@LennyN95 A quick follow-up on the model testing :

Here is the workflow I wanted to test:

  1. Build the base mhub image
  2. build the model image -- nnunet_prostate_zonal_task05
  3. Try on dicom test data following the structure you mentioned

Even though step 1 and 2 completed 'successfully', i.e images were built, it seems that some paths were not set up properly. Here are some logs about the building of the model image: image

When I try to run inference, it looks like no models ID were found inside the model container. I will investigate more, the issue is probably on my side.

image

LennyN95 commented 10 months ago

Yes, the resampling could be generalized. However, it's a static, model-specific task (e.g., various models might utilize various sampling techniques, but whatever sampling technique is used, it will be fixed for inference), right? Importing model definition from Mub models repository.

Regarding the error, this is expected; the model has not yet been accepted. Therefore, you need to specify the repository and branch name during the build:

docker build -t dev/nnunet_prostate_zonal_task05 \
   --build-arg MHUB_MODELS_REPO="https://github.com/MHubAI/models::m-nnunet-prostate" .
ccosmin97 commented 10 months ago

Yes, the resampling could be generalized. However, it's a static, model-specific task (e.g., various models might utilize various sampling techniques, but whatever sampling technique is used, it will be fixed for inference), right?

It is model specific indeed. However, in this model case, no sampling technique was recommend in particular, since they assumed that the ADC and T2 images are co-registered. Any resampling decisions are outside of the model scope, but necessary in case of non co-registered data, such as my case. That's why I mentioned a more generic sampling scheme(T2 to ADC, ADC to T2, fixed grid applied for both), up to the user. As a start, resampling ADC to T2 image is ok. Overall I agree with a fixed sampling strategy for inference, as long as it was recommended by the author of the model, which is not the case here, this resampling strategy is purely motivated by my specific use-case.

LennyN95 commented 10 months ago

Okay in that case, for Mhub we shouldn't apply any resampling in the default case.

If we propose some specific sampling strategy these should always be motivated and evaluated by some analysis. Do you have any statistics of model performance that we could rely on?

I don't like the idea to leave this to the user without guidance, getting this wrong is quite easy and getting it right requires extensive testiness and evaluation. So if we allow for flexibility here, then it should be well documented in the Model Card and outsourced into alternative workflows. What do you think?

ccosmin97 commented 10 months ago

@LennyN95 I tested the current PR with one sample from IDC which can be downloaded using s5mcd using the following command: ~/Downloads/s5cmd_latest/s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/cf61e842-f0f6-4a3a-9a35-671acf207b91/*' . It contains 2 sequences, one T2 and one ADC, from QIN-Prostate-Repeatability, patientID equal to PCAMPMRI-0002.

I modified the FileStructureImporter.structure from :

FileStructureImporter
    import_id: patientID
    structures:
    - $patientID@instance/ADC$part@/dicom@dicom
    - $patientID/T2$part@/dicom@dicom

to

FileStructureImporter
    import_id: patientID
    structures:
    - $patientID@instance/ADC$part@dicom/dicom@dicom
    - $patientID/T2$part@dicom/dicom@dicom

Here is the complete default.yml -- renamed to .json for uploading.

Without this change, the individual SOPUID.dcm objects were not copied from the source folder to the expected temporary folders(_global, imported_instances).

With this change, the rest of the converters and runners worked as expected, producing a segmentation mask with the expected geometry and orientation.

Regarding the actual segmentation result, I did not have time to check if it is similar to my previous analysis results, it might be slightly different since I did used --tta False during inference (default is True -- like in the default workflow).

LennyN95 commented 10 months ago

@ccosmin97 I updated my implementation and added a new customizable parameter disable_tta, default value false.

You can disable TTA by modifying the default.yml (or try with a custom config):

ProstateRunner:
   disable_tta: true

Or you can set the attribute from the cli:

docker run --rm -t --gpus all \
-v /abs/dicom/in:/app/data/input_data:ro \
-v /abs/out/:/app/data/output_data \
dev/nnunet_prostate_zonal_task05 \
--config:modules.ProstateRunner.disable_tta=false

Can you please try both versions (so we can set the appropriate default value)?

ccosmin97 commented 10 months ago

@ccosmin97 I updated my implementation and added a new customizable parameter disable_tta, default value false.

You can disable TTA by modifying the default.yml (or try with a custom config):

ProstateRunner:
   disable_tta: true

Or you can set the attribute from the cli:

docker run --rm -t --gpus all \
-v /abs/dicom/in:/app/data/input_data:ro \
-v /abs/out/:/app/data/output_data \
dev/nnunet_prostate_zonal_task05 \
--config:modules.ProstateRunner.disable_tta=false

Can you please try both versions (so we can set the appropriate default value)?

Thank you @LennyN95 ! It would also be valuable to have the model type as an option: -- model 3d_fullres/2d/...

LennyN95 commented 10 months ago

Updated @ccosmin97. Note that I changed the TTA configurable from disable_ttato use_tta (default false) to use the same parameter names as in our NNUnetRunner implementation.

LennyN95 commented 9 months ago

NOTE: The model is tested & can be pushed.

@ccosmin97 Any updates or objections from your side before we move ahead?

ccosmin97 commented 9 months ago

@LennyN95 Thank you for working on this! I will in the following few days -- this week:

LennyN95 commented 9 months ago

/review

ccosmin97 commented 8 months ago

@LennyN95 I added evaluation results and the used resampling scheme justification to the meta.json. Details are in my fork, in this commit. I could create another PR or something else if that helps.

Testing is next.

ccosmin97 commented 8 months ago

@LennyN95 I was testing out the task05 branch and got this error log for NiftiConverter.log :

image

'Read-only file system' --> Did I mess something in the docker configuration? Did this happen before?

By replacing the structure in FileStructureImporter in the default.yml from :

  - $patientID/$studyID@instance/$part@bundle@dicom
  - $patientID@instance:studyID=none/ADC$part@bundle@dicom
  - $patientID@instance:studyID=none/T2$part@bundle@dicom

to :

    - $patientID/$studyID@instance/$part@dicom@dicom
    - $patientID@instance:studyID=none/ADC$part@dicom@dicom
    - $patientID@instance:studyID=none/T2$part@dicom@dicom

I do get the the desired output segmentations without errors with the FileStructureImporter 'fix' mentioned above.

LennyN95 commented 8 months ago

You need the newest base image for the @bundle to work, can you check if you have the latest :)? Your fix works too (just one @dicom would be enough) but that wouldn't hold for some edge cases.

ccosmin97 commented 8 months ago

You need the newest base image for the @Bundle to work, can you check if you have the latest :)? Your fix works too (just one @dicom would be enough) but that wouldn't hold for some edge cases.

Thanks, that did it! I built the base image locally from your branch, which had a commit to the base image 2 months ago.

It run successfully, I will finalize testing for this model and the 2 others tomorrow/Thursday -- based on your template for submitting model testing.

ccosmin97 commented 8 months ago

@LennyN95

testing one one sample was successfull, results look like what is expected, and DICOM SEG metadata also looks right. Here is the template: Unless you have further comments/additions, the only thing left is to incorporate these changes into the m-nnunet-prostate meta.json file. I can submit a code review if that helps.

/test

sample:
  idcv: Data Release 17.0 December 04, 2023
  data:
    - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.165941479363091869002475812419
        aws: s3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12
        path: test_data/PCAMPMRI-00003/T2/
    - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.179470327513803829437549550387
        aws: s3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/
        path: test_data/PCAMPMRI-00003/ADC
reference:
    url_output: https://www.dropbox.com/scl/fi/vsgt82id8m712e5wbsby6/output.zip?rlkey=jy7vz011uboyauo0q9e3ds0b0&dl=0
    url_input: https://www.dropbox.com/scl/fi/z4yaunck470pwal70yrck/input.zip?rlkey=rzixexi0ckrk3mkkd7ysr9d9c&dl=0
notes:
This model takes a multi-modality input, namely T2 and ADC sequence. The model input also needs to be sorted by patientID, at the very least. Steps below are included to show how to create the input model case folder : 
mkdir -p test_data/PCAMPMRI-00009/
mkdir -p test_data/PCAMPMRI-00009/T2/
s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12/*' test_data/PCAMPMRI-00003/T2/
mkdir -p test_data/PCAMPMRI-00009/ADC/
s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/*' test_data/PCAMPMRI-00003/ADC/

mkdir -p out_data

docker run -v /home/exouser/Documents/mhub_task05_exp/test_template/test_data/:/app/data/input_data:ro \
-v /home/exouser/Documents/mhub_task05_exp/test_template/out_data/:/app/data/output_data \
dev/nnunet_prostate_zonal_task05:latest --workflow default

image image

LennyN95 commented 8 months ago

Thx Cosmin! I'll integrate the meta changes and run our new test pipeline. The template needs to be exactly as in the documentation for our automation to work, I'll post and try below.

LennyN95 commented 8 months ago

/test

sample:
  idc_version: "Data Release 17.0 December 04, 2023"
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.165941479363091869002475812419
    aws_url: s3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12/*
    path: PCAMPMRI-00003/T2
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.179470327513803829437549550387
    aws_url: s3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/*
    path: PCAMPMRI-00003/ADC
reference:
  url: https://www.dropbox.com/scl/fi/vsgt82id8m712e5wbsby6/output.zip?rlkey=jy7vz011uboyauo0q9e3ds0b0&dl=0

Test Results (24.01.29_13.36.48_9caE8eLF4g) ```yaml checked_files: - path: /app/test/src/PCAMPMRI-00003/none/nnunet_prostate_zonal_task05.seg.dcm checker: DicomsegContentCheck summary: files_missing: 0 files_extra: 0 checks: DicomsegContentCheck: files: 1 conclusion: true ```

Test Results (24.02.28_12.56.22_MnXlLNinpg) ```yaml id: 7779c218-57cf-459c-a988-1c9b398135f5 date: '2024-02-28 13:09:46' checked_files: - file: nnunet_prostate_zonal_task05.seg.dcm path: /app/test/src/PCAMPMRI-00003/none/nnunet_prostate_zonal_task05.seg.dcm checks: - checker: DicomsegContentCheck notes: - label: Segment Count description: The number of segments identified in the inspected dicomseg file. info: 2 summary: files_missing: 0 files_extra: 0 checks: DicomsegContentCheck: files: 1 conclusion: true ```