Closed LennyN95 closed 9 months ago
@LennyN95 here is the meta.json file : nnunet_task05_meta.json
Regarding the segdb codes, I did not see anything for the prostate zonal regions -- output labels for this models.
You could use these codes :
PROSTATE_PERIPHERAL_ZONE | Peripheral zone of prostate | C_BODY_STRUCTURE | T_PROSTATIC_PZ_STRUCTURE | |
where T_PROSTATIC_PZ_STRUCTURE = 279706003
PROSTATE_TRANSITION_ZONE | Transition zone of prostate | C_BODY_STRUCTURE | T_PROSTATIC_TZ_STRUCTURE | | where T_PROSTATIC_TZ_STRUCTURE = 399384005
Here is my dcmqi json file for converting these SEGS to DICOM for reference : task05_nnunet_dcmqi.json
For licensing purposes, heres the nnUNet task05 dataset.json, being a bit more explicit about their licensing scheme. dataset_task05.json
Thx @ccosmin97,
Following your suggestion, I added the two new substructures to our SegDB with commit 9f922e1f.
Regarding the dicom conversion, which MR scan did you use as reference, I guess T2?
Regarding the dicom conversion, which MR scan did you use as reference, I guess T2?
I am a bit confused here, do you mean the resampling scheme to get identical geometry between the T2 and ADC volumes? In that case, I used the T2 as a fixed grid to resample the ADC volume.
If you are referring to the dicomseg conversion of potential manual annotations, in my use case (ProstateX, QIN-Prostate-Repeatability collections) they always refer to the T2 volume.
I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal. I did do resampling of ADC to T2 during my analysis because registration of ADC and T2 volumes was out of scope for the collections analyzed.
I did put some notes regarding this in the meta.json
I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal
Yes @ccosmin97, I therefore build a Resampler Module that will resample ADC to T2's spacing. Is that what you mean?
I think it is worth highlighting that this model was trained on co-registered T2 and ADC volumes, so any resampling scheme on ADC and T2 sequences that are not co-registered is sub-optimal
Yes @ccosmin97, I therefore build a Resampler Module that will resample ADC to T2's spacing. Is that what you mean?
@LennyN95 That's what I did for my analysis, perfect. For these multi-modal models that assume co-registered inputs, maybe it is a good idea to add a more generic resampling scheme in the future, where the user can choose the fixed resampling grid. In this case, it is the T2 image, but it also could be the ADC one or a fixed arbitrary grid. Motivation is these choices are purely experimental, trying to conform to same-size inputs. I do not think there is a perfect way to design this resampling scheme, it is case-specific. In my case, resampling to the higher resolution, i.e. T2, makes more sense, also because the T2 image has better prostate tissue contrast compared to ADC, so we do not want to degrade/resample the T2 image. I believe in your Resample module, it all boils down to the 'fixed' plastimatch argument, where here you specify 'path_to_t2.nii.gz' but it could also be a fixed grid-like '250,250,30' -- x,y,z voxel size
@LennyN95 A quick follow-up on the model testing :
Here is the workflow I wanted to test:
Even though step 1 and 2 completed 'successfully', i.e images were built, it seems that some paths were not set up properly. Here are some logs about the building of the model image:
When I try to run inference, it looks like no models ID were found inside the model container. I will investigate more, the issue is probably on my side.
Yes, the resampling could be generalized. However, it's a static, model-specific task (e.g., various models might utilize various sampling techniques, but whatever sampling technique is used, it will be fixed for inference), right? Importing model definition from Mub models repository.
Regarding the error, this is expected; the model has not yet been accepted. Therefore, you need to specify the repository and branch name during the build:
docker build -t dev/nnunet_prostate_zonal_task05 \
--build-arg MHUB_MODELS_REPO="https://github.com/MHubAI/models::m-nnunet-prostate" .
Yes, the resampling could be generalized. However, it's a static, model-specific task (e.g., various models might utilize various sampling techniques, but whatever sampling technique is used, it will be fixed for inference), right?
It is model specific indeed. However, in this model case, no sampling technique was recommend in particular, since they assumed that the ADC and T2 images are co-registered. Any resampling decisions are outside of the model scope, but necessary in case of non co-registered data, such as my case. That's why I mentioned a more generic sampling scheme(T2 to ADC, ADC to T2, fixed grid applied for both), up to the user. As a start, resampling ADC to T2 image is ok. Overall I agree with a fixed sampling strategy for inference, as long as it was recommended by the author of the model, which is not the case here, this resampling strategy is purely motivated by my specific use-case.
Okay in that case, for Mhub we shouldn't apply any resampling in the default case.
If we propose some specific sampling strategy these should always be motivated and evaluated by some analysis. Do you have any statistics of model performance that we could rely on?
I don't like the idea to leave this to the user without guidance, getting this wrong is quite easy and getting it right requires extensive testiness and evaluation. So if we allow for flexibility here, then it should be well documented in the Model Card and outsourced into alternative workflows. What do you think?
@LennyN95
I tested the current PR with one sample from IDC which can be downloaded using s5mcd using the following command:
~/Downloads/s5cmd_latest/s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/cf61e842-f0f6-4a3a-9a35-671acf207b91/*' .
It contains 2 sequences, one T2 and one ADC, from QIN-Prostate-Repeatability, patientID equal to PCAMPMRI-0002.
I modified the FileStructureImporter.structure from :
FileStructureImporter
import_id: patientID
structures:
- $patientID@instance/ADC$part@/dicom@dicom
- $patientID/T2$part@/dicom@dicom
to
FileStructureImporter
import_id: patientID
structures:
- $patientID@instance/ADC$part@dicom/dicom@dicom
- $patientID/T2$part@dicom/dicom@dicom
Here is the complete default.yml -- renamed to .json for uploading.
Without this change, the individual SOPUID.dcm objects were not copied from the source folder to the expected temporary folders(_global, imported_instances).
With this change, the rest of the converters and runners worked as expected, producing a segmentation mask with the expected geometry and orientation.
Regarding the actual segmentation result, I did not have time to check if it is similar to my previous analysis results, it might be slightly different since I did used --tta False during inference (default is True -- like in the default workflow).
@ccosmin97 I updated my implementation and added a new customizable parameter disable_tta
, default value false
.
You can disable TTA by modifying the default.yml (or try with a custom config):
ProstateRunner:
disable_tta: true
Or you can set the attribute from the cli:
docker run --rm -t --gpus all \
-v /abs/dicom/in:/app/data/input_data:ro \
-v /abs/out/:/app/data/output_data \
dev/nnunet_prostate_zonal_task05 \
--config:modules.ProstateRunner.disable_tta=false
Can you please try both versions (so we can set the appropriate default value)?
@ccosmin97 I updated my implementation and added a new customizable parameter
disable_tta
, default valuefalse
.You can disable TTA by modifying the default.yml (or try with a custom config):
ProstateRunner: disable_tta: true
Or you can set the attribute from the cli:
docker run --rm -t --gpus all \ -v /abs/dicom/in:/app/data/input_data:ro \ -v /abs/out/:/app/data/output_data \ dev/nnunet_prostate_zonal_task05 \ --config:modules.ProstateRunner.disable_tta=false
Can you please try both versions (so we can set the appropriate default value)?
Thank you @LennyN95 ! It would also be valuable to have the model type as an option: -- model 3d_fullres/2d/...
Updated @ccosmin97. Note that I changed the TTA configurable from disable_tta
to use_tta
(default false) to use the same parameter names as in our NNUnetRunner implementation.
NOTE: The model is tested & can be pushed.
@ccosmin97 Any updates or objections from your side before we move ahead?
@LennyN95 Thank you for working on this! I will in the following few days -- this week:
/review
@LennyN95 I added evaluation results and the used resampling scheme justification to the meta.json. Details are in my fork, in this commit. I could create another PR or something else if that helps.
Testing is next.
@LennyN95 I was testing out the task05 branch and got this error log for NiftiConverter.log :
'Read-only file system' --> Did I mess something in the docker configuration? Did this happen before?
By replacing the structure in FileStructureImporter in the default.yml from :
- $patientID/$studyID@instance/$part@bundle@dicom
- $patientID@instance:studyID=none/ADC$part@bundle@dicom
- $patientID@instance:studyID=none/T2$part@bundle@dicom
to :
- $patientID/$studyID@instance/$part@dicom@dicom
- $patientID@instance:studyID=none/ADC$part@dicom@dicom
- $patientID@instance:studyID=none/T2$part@dicom@dicom
I do get the the desired output segmentations without errors with the FileStructureImporter 'fix' mentioned above.
You need the newest base image for the @bundle to work, can you check if you have the latest :)? Your fix works too (just one @dicom would be enough) but that wouldn't hold for some edge cases.
You need the newest base image for the @Bundle to work, can you check if you have the latest :)? Your fix works too (just one @dicom would be enough) but that wouldn't hold for some edge cases.
Thanks, that did it! I built the base image locally from your branch, which had a commit to the base image 2 months ago.
It run successfully, I will finalize testing for this model and the 2 others tomorrow/Thursday -- based on your template for submitting model testing.
@LennyN95
testing one one sample was successfull, results look like what is expected, and DICOM SEG metadata also looks right. Here is the template: Unless you have further comments/additions, the only thing left is to incorporate these changes into the m-nnunet-prostate meta.json file. I can submit a code review if that helps.
/test
sample:
idcv: Data Release 17.0 December 04, 2023
data:
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.165941479363091869002475812419
aws: s3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12
path: test_data/PCAMPMRI-00003/T2/
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.179470327513803829437549550387
aws: s3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/
path: test_data/PCAMPMRI-00003/ADC
reference:
url_output: https://www.dropbox.com/scl/fi/vsgt82id8m712e5wbsby6/output.zip?rlkey=jy7vz011uboyauo0q9e3ds0b0&dl=0
url_input: https://www.dropbox.com/scl/fi/z4yaunck470pwal70yrck/input.zip?rlkey=rzixexi0ckrk3mkkd7ysr9d9c&dl=0
notes:
This model takes a multi-modality input, namely T2 and ADC sequence. The model input also needs to be sorted by patientID, at the very least. Steps below are included to show how to create the input model case folder :
mkdir -p test_data/PCAMPMRI-00009/
mkdir -p test_data/PCAMPMRI-00009/T2/
s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12/*' test_data/PCAMPMRI-00003/T2/
mkdir -p test_data/PCAMPMRI-00009/ADC/
s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/*' test_data/PCAMPMRI-00003/ADC/
mkdir -p out_data
docker run -v /home/exouser/Documents/mhub_task05_exp/test_template/test_data/:/app/data/input_data:ro \
-v /home/exouser/Documents/mhub_task05_exp/test_template/out_data/:/app/data/output_data \
dev/nnunet_prostate_zonal_task05:latest --workflow default
Thx Cosmin! I'll integrate the meta changes and run our new test pipeline. The template needs to be exactly as in the documentation for our automation to work, I'll post and try below.
/test
sample:
idc_version: "Data Release 17.0 December 04, 2023"
data:
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.165941479363091869002475812419
aws_url: s3://idc-open-data/0c341066-954b-4076-aa9d-aef460b6ab12/*
path: PCAMPMRI-00003/T2
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.3671.4754.179470327513803829437549550387
aws_url: s3://idc-open-data/6bbd50a9-e7fa-44fd-947c-13fa1d2eb4f8/*
path: PCAMPMRI-00003/ADC
reference:
url: https://www.dropbox.com/scl/fi/vsgt82id8m712e5wbsby6/output.zip?rlkey=jy7vz011uboyauo0q9e3ds0b0&dl=0
I updated my implementation of the nnUNet Prostate MR (Task005) model.
The model takes two input series:
The current default config uses a FileStructureImporter that accepts the following input structure (note that study is optional):
To use the patient/strudy structure only, change to:
To use the patient/ structure only use:
To omit the dicom folders and instead organize all dicom files directly under the ADC and T2 folders, the following import structure can be used: