ImagingDataCommons / CloudSegmentator

Medical imaging segmentation workflows for FireCloud (Terra) and Seven Bridges Cancer Genomics Cloud
Apache License 2.0
3 stars 2 forks source link

Experiment with grouping segments into separate SEG instances to bring per-frame FG size under 1MB #52

Closed fedorov closed 7 months ago

fedorov commented 7 months ago

@vkt1414 here's the idea. Can you find the largest (by the number of slices) CT series, confirm its per-frame FG size is ~3.9M, and then experiment with splitting segments into separate SEG instances? We could, for example, follow roughly the grouping used in the TS front page https://github.com/wasserth/TotalSegmentator?tab=readme-ov-file#totalsegmentator, which already defines 5 groups. It might be best to start with just separating the "skeleton" group, since it might be the one resulting in the largest number of frames. If that group alone results in FG size above 1MB, then we could split ribs from vertebrae, for example.

cc: @dclunie

fedorov commented 7 months ago

Series identified by @vkt1414 for confirming we are within limits while grouping:

fedorov commented 7 months ago

@vkt1414 here's the script that breaks segment frames into 4 groups - can you update it to save into groups based on the TS group assignment? https://github.com/ImagingDataCommons/CloudSegmentator/pull/53

vkt1414 commented 7 months ago

@vkt1414 here's the script that breaks segment frames into 4 groups - can you update it to save into groups based on the TS group assignment? #53

Thank you!

I could not fully understand the script. The csv does not have labelIDs and since segment numbers are not necessarily same as labelIDs, I'm not sure how we can use segment numbers in DSEG for finding which class they belong to

vkt1414 commented 7 months ago

I guess it's doable, but we will also need the SEG NIfTI file. Please correct me if there's a better way than this:

  1. Map the CSV file with label IDs from TS v1.
  2. Extract label IDs from NIfTI, sort them in ascending order, and then map them to DICOM SEG segment numbers.
  3. Based on the label IDs, determine the class they belong to, divide the PerFrame group into up to five DICOM files with your code, and then calculate the size of the resulting DICOM files.
fedorov commented 7 months ago

Vamsi, that script has the only purpose to confirm that if we split segments into groups by categories defined in TotalSegmentator, the total size of the per-frame FGs sequence is below 1MB.

It is not the purpose of this script to split the SEG as part of the overall workflow.

fedorov commented 7 months ago

The csv does not have labelIDs and since segment numbers are not necessarily same as labelIDs

Independently from this specific issue, this is an important point. I am thinking that it may make sense to encode TotalSegmentator label name in SegmentDescription. It may be useful to the users. What do you think?

vkt1414 commented 7 months ago

Here's the notebook and sample files (which are created by running this notebook) that I think extends your code..

https://colab.research.google.com/drive/1R4VyzgrVxoRyg9ngpU4J9LjEOb3g-z-C?usp=sharing https://drive.google.com/file/d/19j-ilkjpR3zclDXfRqOLjochDm2Agw-6/view?usp=sharing

fedorov commented 7 months ago

@vkt1414 communication in discord:

even for 800 slices, except for gastrointestinal tract group, all others were slightly or significantly over 1 MB.

https://github.com/vkt1414/CloudSegmentator/releases/download/test/800-slices-perframe.zip https://github.com/vkt1414/CloudSegmentator/releases/download/test/800_slices_seg_nifti_dseg.zip

fedorov commented 7 months ago

Based on the experiments, discussions and reflections, I decided we should not attempt this optimization for the current experiment for the following reasons:

We may revisit this at a later time at a subsequent iteration of processing, if such iteration takes place.