Add support for saving segments in a single file

fedorov commented 1 year ago

WIP towards addressing #462

fedorov commented 1 year ago

Failures of the round-trip tests are expected, since output is not what is expected.

Issues identified that need to be fixed:

copy of the unpacked frame in https://github.com/QIICR/dcmqi/blob/master/libsrc/ImageSEGConverter.cpp#L757-L772 must be very inefficient - use something like https://github.com/InsightSoftwareConsortium/ITK/blob/633f84548311600845d54ab2463d3412194690a8/Examples/Iterators/ImageRegionIterator.cxx#L201-L212 instead? (also https://examples.itk.org/src/core/common/importpixelbufferintoanimage/documentation)

fedorov commented 1 year ago

:fireworks:

fedorov commented 1 year ago

Now I need to fix DICOMSegmentationPlugin.py which unfortunately makes the assumption that there will always be one segment per nrrd file:

https://github.com/QIICR/QuantitativeReporting/blob/71f01876d95b6190218b4e23b2dd30d9d5e82488/DICOMPlugins/DICOMSegmentationPlugin.py#L142-L150

fedorov commented 8 months ago

@michaelonken I think I know why it fails. The NRRD outputs you are saving start with prefix 0, while they used to start from 1. E.g., 23x38x3-0.nrrd should have been named 23x38x3-1.nrrd. It should be a trivial fix, since I just think you need to bump the key returned in this map to start from 1: https://github.com/fedorov/dcmqi/blob/merged-segments/apps/seg/segimage2itkimage.cxx#L43. I think it would be better it is fixed at the time that map is created rather than in the loop that writes the files, for consistency of the API, so I think it will be more efficient if you do it, because I can't tell for sure what this may affect.

michaelonken commented 8 months ago

Makes sense,thanks! I'll try this on Monday.

fedorov commented 8 months ago

@michaelonken here are commands to download the image and segmentation for an oblique acquisition:

seg: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/c6d7ad8a-835c-478f-956e-a09171698f51/*' .
image: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/89ac2bc6-893e-4fc7-9dcc-4731617fd716/*' .

You will need to have s5cmd installed first: https://github.com/peak/s5cmd.

For the sake of completeness, the above commands will download corresponding DICOM series from the public AWS buckets maintained by NCI Imaging Data Commons (IDC). The samples are from the QIN-Prostate-Repeatablity collection, which is hosted on IDC.

michaelonken commented 8 months ago

Current status of this pull request:

Overall state: The DICOM to ITK conversion is now in a quite stable state -- from my point of view :-). More tests would be good though since I used limited amount of test data so far.
Refactoring: I refactored the SegImageConverter code into two separate classes (Itk2DicomConverter and Dicom2ItkConverter) since both use cases did not have much doe in common. Also I (earlier) refactored the DIcom2Itk code quite a lot to make it more readable and reusable.
Tests: Tests are working now (again) on all 3 supported platforms, including one test to verify the NRRD and JSON files produced for the --mergeSegment case.

Weaknesses:

The slice sorting is done by identifying the relevant coordinate for sorting (by evaluation cross product of image orientation vectors). The resulting frame (positions) are grouped into a "logical frame" if they are not further away than 1% of slice thickness. I am not sure whether this is a valid approach or not.
Right now the overlap checking compares 8 pixels at once through a binary AND operation. This works well for images where rows cols MOD 8 is 0, i.e. dividable by 8 since then it is easy to access each frame separately. When this is not the case, frames need to be unpacked from 1 bit binary DICOM into 8 bit pixel cells (one byte per bit) which takes significant time. It should be evaluated whether the byte-wise comparison method also works for rows cols % 8!=0 cases, or, at least with some extra code.

Other remaining work:

Bonus: NRRD export w/o ITK: Looking at the simplicity of the NRRD files, I could imagine a direct export from DICOM to NRRD without the ITK detour would also be significantly faster. But maybe we can see whether performance and memory is already good enough right now. For instance, writing the BWH Total Segmentator example to a merged ITK image only(?) takes around 5 seconds right now. This could also go to DCMTK (as the OverlapUtil class probably will) as a little extra tool.

fedorov commented 8 months ago

@michaelonken I built the branch, and it appears that the output files that are created by segimage2itkimage are still numbered starting from 0. Is there a reason for this? I think there is benefit to maintaining numbering conventions for compatibility with other tools. Specifically, I ran across this because Slicer plugin expected 0-based numbering. I can fix that, but I wonder if there are other tools/workflows that can get broken because of this change.

michaelonken commented 8 months ago

@michaelonken I built the branch, and it appears that the output files that are created by segimage2itkimage are still numbered starting from 0.

Hm, I could not confirm this on my system, e.g. see the following log:

/dcmqi-build/bin/segimage2itkimage --inputDICOM ~/data/dcm/SEG/bwh_NLST_207114_TotalSegmentator/seg/segmentation.dcm --outputDirectory /tmp

dcmqi repository URL: git@github.com:QIICR/dcmqi.git revision: c1f09ef tag: latest-89-gc1f09ef
Row direction: 1 0 0
Col direction: 0 -1 0
Z direction: 0 0 -1
Total frames: 9654
Total frames with unique IPP: 455
Total overlapping frames: 455
Origin: [-179.1, 185.217, -31.755]
Slice extent: 329.15
Slice spacing: 0.725002
Will not merge segments: Splitting segments into 77 groups
Writing itk image to /tmp/1.nrrd ... done
Writing itk image to /tmp/2.nrrd ... done

Also using --mergedSegements starts exports with: Writing itk image to /tmp/1.nrrd ... done Listing /tmp also shows that there is no file /tmp/0.nrrd and the lowest number found is /tmp/1.nrrd.

Maybe I am overlooking something, or have you maybe used an old version that has been in your PATH or so?

fedorov commented 8 months ago

@michaelonken I am very sorry - I messed it up. Indeed, I was on a stale branch. I thought I updated, but I was multitasking, and apparently I did something wrong. Output file numbering works as expected! I am fixing few items in the Slicer QuantitativeReporting extension in this PR that I discovered in the process of testing.

fedorov commented 8 months ago

@michaelonken I did testing with various samples from IDC, and it works great!

I am very tempted to just merge it and get this out into Slicer. We can address refinements in subsequent work. What do you think?

It should be evaluated whether the byte-wise comparison method also works for rows * cols % 8!=0 cases, or, at least with some extra code.

Indeed, it will require some extra code, since AND is applied to frames before unpacking. Such frame sizes are not common. For the record, I used the query below to search IDC for such segmentations:

SELECT
  collection_id,
  PatientID,
  StudyInstanceUID,
  SeriesInstanceUID,
  `Rows`,
  `Columns`,
  ARRAY_LENGTH(SegmentSequence) AS numberOfSegments
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality="SEG"
  AND MOD(`Rows`*`Columns`, 8) <> 0
ORDER BY
  Collection_id desc

and only three collections contain such samples, all of them containing just single segment. So I do not think it makes sense to spend too much time on this, or delay integration of this PR.

michaelonken commented 8 months ago

Thanks for testing @fedorov ! Nice SQL query :-)

For the MOD 8 issue: I agree, let's skip it for now. I will look into it if there is some time left. I thought about it and I think the bit-wise comparison will also work for the other frame sizes as long as the "margin" bytes at the beginning and at the end of the frame, which may contain overlapping bits from other frames, are masked out correctly first. This is a bit intricate so I don't want to implement this quickly without proper unit tests.

The one remaining thing that we might to guard against are oblique slices. If I understand correctly, in those Series the position (Image Position Patient) does not only vary in one (e.g. z) coordinate between slices, but also in the other two coordinates (x,y). The overlap detection code right now superimposes frames that are on the same z coordinate, without "aligning" them on their x,y coordinate. This way pixels are compared that might have totally different positions in 3D space, eventually leading to broken merges or non-merges of segments. The right way would be to accommodate for the x,y shift before doing the comparison. Probably it is just some "simple" matrix operation(s) but nothing I could shake off my sleeve. As a workaround, one might try to detect oblique slicing and then fallback to the non-merging operation.

What do you think?

fedorov commented 8 months ago

@michaelonken I spent some time looking at the code in question, which as I understand is around here: https://github.com/fedorov/dcmqi/blob/merged-segments/libsrc/OverlapUtil.cpp#L736-L761.

I think this logic could be significantly simplified. Instead of trying to detect relevantCoordinate, we can use the existing FrameSorter class to order frames by their position along the ImageOrientationPatient vector, we already have the function for this: https://github.com/fedorov/dcmqi/blob/merged-segments/include/dcmqi/framesorter.h#L230. Given that ordering, I think we should use the difference between the values of dist here https://github.com/fedorov/dcmqi/blob/merged-segments/include/dcmqi/framesorter.h#L268 to detect overlapping frames, applying the same threshold logic. I am pretty sure this should be robust to any orientation.

michaelonken commented 8 months ago

@fedorov I can check this in detail Tuesday next week. From just taking a quick look I am not sure whether the frame sorting code you referenced takes into account the x,y coordinate shifting that I mentioned. It's clear, also for oblique cases, how to identify the relevant coordinate (using Image Orientation Patient as the framesorter code and the OverlapUtil code does).

The problem I see is in the shifting of the other two coordinates, so that if two frames are on the same "plane" (e.g. z coordinate, "relevant coordinate"), we cannot simply compare value for "pixel" 1,1 from frame A to the 1,1 from frame B since Frame B might start 2 cm more to the right (e.g. x direction) and 1 cm more in y-direction. Here is a very sophisticated picture to demonstrate my bad explanation :-)

grafik

fedorov commented 8 months ago

Michael, I do not think this would be the case for any volumetric acquisition. In oblique scans, the frames are not shifted when observed along the scan direction. What you show would be the case if one projected oblique acquisition slices to a plane.

michaelonken commented 8 months ago

Ah, thank you. Sorry for the noise then. And we can always assume that we don't have such kind of shifts in any acquisitions we handle?

I think then we are safe to go with the current code. We could simplify by using the existing framesorter code (that I just didn't know) but maybe this can be a second step a bit later if you want to merge.

P.S: Note that Frame A and Frame B are usually parts of two different segments, not of a real "acquisition" of a modality, so a segmentation creator could in theory at least place frames in space where he wants to; this is what I am afraid of.

fedorov commented 8 months ago

P.S: Note that Frame A and Frame B are usually parts of two different segments, not of a real "acquisition" of a modality, so a segmentation creator could in theory at least place frames in space where he wants to; this is what I am afraid of.

This may be a valid scenario in theory, but it would be crazy, and in any case it is probably not handled in dcmqi in other places. I am not sure. I have never seen such cases. Would be nice to add an error check for this or something like this!

I will go ahead merging this, to make this functionality easier accessible to the users, and make it easier to test and identify deficiencies!

QIICR / dcmqi

Add support for saving segments in a single file #464