Use label id as segment numbers

michaelonken commented 7 months ago

This pull request contains two parts:

New functionality:

In ITK to DICOM conversion, use Label IDs from input files as Segment Numbers if possible. From the related API documentation:

 * @param sortByLabelID A boolean indicating whether to sort the segments by label ID.
 *        In this case, the label IDs are used as DICOM segment numbers. This only works
 *        if the label IDs start at 1 and are numbered monotonically without gaps. The processing
 *        order of label IDs is not relevant, i.e. they can occur in any order in the input.
 *        If n labels are not assigned uniquely to label IDs 1..n in the input, the
 *        conversion will fail.
 *        If this is set to false (default), the segment numbers are assigned in the order of the
 *        labels that are being converted, i.e. the first label will receive the Segment
 *        Number 1, the second label will receive the Segment Number 2, etc.

Also, a new test group is added that uses some (more than before) sophisticated segment setup with overlapping segments. A first test produces a DICOM file from three NRRD inputs, which are merged together in one DICOM file. A second test splits them again into 3 NRRD files (where 2 NRRD files will contain more than one segment) and compares them to the original inputs from test 1. A third test compares metadata input used in test 1 to metadata output produced by test 2.

Various small enhancements in ITK to DICOM conversion context:

Replaced some deprecated JSON writer code
Enhanced documentation
Made logging output a bit more consistent
Save some memory (don't copy segmentation dataset on return)

michaelonken commented 7 months ago

Thank you for the thorough review @fedorov! I will check soon and provide feedback.

-- Sent from mobile

14.12.2023 22:35:48 Andrey Fedorov @.***>:

@.**** requested changes on this pull request.

I spent quite some time with this PR, and I have a lot of questions. I am not sure it will work.

Also, note that MultiFrameDimensionModule utilizes ReferencedSegmentNumber in the per-frame FGs, see: https://github.com/QIICR/dcmqi/blob/master/libsrc/Itk2DicomConverter.cpp#L58-L62.

I would think that all of the frames would need to be reordered in the general case if the segments are renumbered.

Let me know if it is easier to have a screen share to go over this!

In apps/seg/itkimage2segimage.xml[https://github.com/QIICR/dcmqi/pull/485#discussion_r1425688622]:

@@ -64,6 +64,15 @@
Skip empty slices while encoding segmentation image. By default, empty slices will not be encoded, resulting in a smaller output file size.

@michaelonken[https://github.com/michaelonken] do we really need this flag? I do not see any downsides in always sorting label IDs and always reusing those if possible in SEG. Since you already implemented this option, why not keep it but set the default to true?

In libsrc/Itk2DicomConverter.cpp[https://github.com/QIICR/dcmqi/pull/485#discussion_r1425699134]:

@@ -367,16 +372,16 @@ namespace dcmqi { CHECK_COND(fgder->addDerivationImageItem(CodeSequenceMacro(code_seg.CodeValue,code_seg.CodingSchemeDesignator, code_seg.CodeMeaning),"",derimgItem));

//cout << "Total of " << siVector.size() << " source image items will be added" << endl;

// cout << "Total of " << siVector.size() << " source image items will be added" << endl;

Can you either remove this and the following commented out line, or swap for a logging command? I know this is most likely my code, not yours ....

In libsrc/Itk2DicomConverter.cpp[https://github.com/QIICR/dcmqi/pull/485#discussion_r1427294867]:

@@ -285,6 +291,7 @@ namespace dcmqi {
     Uint16 segmentNumber;
     CHECK_COND(segdoc->addSegment(segment, segmentNumber /* returns logical segment number */));
segNum2Label.insert(make_pair(segmentNumber, label));

Here, different values of segmentNumber can map to the same value of label, which may not be unique across input segmentation files. Is this ok?

In libsrc/Itk2DicomConverter.cpp[https://github.com/QIICR/dcmqi/pull/485#discussion_r1427307686]:

@@ -483,10 +494,117 @@ namespace dcmqi { segmentsOverlap = "NO"; else segmentsOverlap = "UNDEFINED";

CHECK_COND(segdocDataset.putAndInsertString(DCM_SegmentsOverlap, segmentsOverlap.c_str()));

CHECK_COND(segdocDataset->putAndInsertString(DCM_SegmentsOverlap, segmentsOverlap.c_str()));

}

if (sortByLabelID)

{

sortByLabel(segdocDataset.get(), segNum2Label);

I do not see how this could possibly work. I do not see any checks for non-consecutive segment numbering, so if there are gaps in label numbers, the result will violate the conventions for having consecutive values for SegmentNumber. Did I miss something?

In libsrc/Itk2DicomConverter.cpp[https://github.com/QIICR/dcmqi/pull/485#discussion_r1427305367]:

@@ -285,6 +291,7 @@ namespace dcmqi {
     Uint16 segmentNumber;
     CHECK_COND(segdoc->addSegment(segment, segmentNumber /* returns logical segment number */));
segNum2Label.insert(make_pair(segmentNumber, label));

Yes, looking further, this will definitely be a problem, since this map is later used for renumbering segments!

— Reply to this email directly, view it on GitHub[https://github.com/QIICR/dcmqi/pull/485#pullrequestreview-1780176326], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABL56T6F2P24KZ5DNYYRDPLYJNWLHAVCNFSM6AAAAABAMMVHZCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOBQGE3TMMZSGY]. You are receiving this because you were mentioned. [Verfolgungsbild][https://github.com/notifications/beacon/ABL56T7GUQIWXK3DIMLBYG3YJNWLHA5CNFSM6AAAAABAMMVHZCWGG33NNVSW45C7OR4XAZNRKB2WY3CSMVYXKZLTORJGK5TJMV32UY3PNVWWK3TUL5UWJTTKDNK4M.gif]

michaelonken commented 7 months ago

Also, note that MultiFrameDimensionModule utilizes ReferencedSegmentNumber in the per-frame FGs, see: https://github.com/QIICR/dcmqi/blob/master/libsrc/Itk2DicomConverter.cpp#L58-L62.

Good catch, i missed the dimension assignment. The MultiFrameDimensionModule can stay like it is (first dimension Referenced Segment Number, second Image Position Patient) but the dimension assignment in the FrameContentMacro has to be updated as well.

After thinking about it, I assume it would be enough to also update the (Referenced) Segment Number used as 1st dimension value in Dimension Index Value in FrameContentMacro. The selection of frames and their order (denoted by second dimension) for the display do not need to change if we just change the Segment Numbers.

Comment 2023-12-18: I gave it some more thought and we could even keep it as is. The number of frames for display must not change at all if we re-number the segments (only). The display would not start any more with the old (unmapped) Segment 1 but with whatever Segment this has been mapped to, but this is not invalid or wrong. Of course one can adapt the order to start with new (mapped) 1 by applying the mapping to the Dimension Index Values as well in a second step.

fedorov commented 7 months ago

I assumed that labels are unique in the input. If this assumption does not hold, then indeed the sorting does not make sense (since the idea to use the original labels as Segment Numbers cannot work then if two labels use the same number)

Aha! That's the key. Yes, I do not think we can assume anything about the label uniqueness. So what do we do with this PR?

michaelonken commented 7 months ago

I assumed that labels are unique in the input. If this assumption does not hold, then indeed the sorting does not make sense (since the idea to use the original labels as Segment Numbers cannot work then if two labels use the same number)

Aha! That's the key. Yes, I do not think we can assume anything about the label uniqueness. So what do we do with this PR?

As discussed (offline), I am pretty sure I know how to fix this. But we will make the flag not the default behavior for now.

QIICR / dcmqi

Use label id as segment numbers #485