Revisit calculation of the number of slices for SEG conversion

QIICR / dcmqi

dcmqi (DICOM for Quantitative Imaging) is a free, open source C++ library for conversion between imaging research formats and the standard DICOM representation for image analysis results

https://qiicr.gitbook.io/dcmqi-guide/

BSD 3-Clause "New" or "Revised" License

238 stars 62 forks source link

Revisit calculation of the number of slices for SEG conversion #275

Closed fedorov closed 2 years ago

fedorov commented 7 years ago

Since frames of SEG can overlap or be non-contiguous, the number of slices needs to be computed, which is done here: https://github.com/QIICR/dcmqi/blob/master/libsrc/ImageSEGConverter.cpp#L508-L509.

The approach to calculate the slices is to sort all frames in geometric order, take the difference between the origins of the first and last frames, divide by SpacingBetweenSlices (which must be declared for meaningful interpretation of the SEG geometry, as amended in CP-1427), and add 1.

Current approach applies ceil() to get the number of slices after dividing the distance. This can potentially lead to a situation where the resulting volume has 1 extra slice, when the distance is slightly above (SpacingBetweenSlices may not be exactly consistent, or this can happen due to machine precision issues). We should probably replace ceil() with round().

fedorov commented 7 years ago

supersedes #229

lorteddie commented 3 years ago

I got a question about dimensions and spacings, especially in multi-segment segmentation objects. It is quite awkward that a seg object can include segments referencing multi series and even studies. Considering that, it is quite hard to understand why a seg object has a specific x/y dimension for all frames, since multiple series are likely to have different dimensions. We work around this by grouping measurements which are supposed to be in one object by x/y dimensions possibly creating multiple objects. How is this supposed to work? Rescaling seems very wrong. Then we tried multiple single frame segments in one object and had several gb of ram usage for 13 frames, this seems to be because dcmqi computes the volume extent of the whole object instead of computing it per segment. I couldn't understand why and looking at the previous statement that the object can include segments from several series/studies this seems odd, especially because FoR isn't considered. A similar issue arises with the image spacings, since pixel measure could be listed per frame (which is madness, should be segment but I guess the multiframe design does not allow that). dcmqi currently expects all segments to have the same spacings and I wonder why. Also in ConverterBase::computeVolumeExtent there is a comment stating that all segments must have the same number of frames. I couldn't find any reference of such a requirement in the standard or elsewhere, is this really a thing? We currently crop all segmentations before writing them to seg objects, so it is highly likely that the segments will have different numbers of frames. We do this to save space and also to reduce the likelihood that the segmentation will get rejected because it falls in between sub-volumes when the series is split into several volumes because it is not contiguous or slices are missing or has overlaps or the other software sorts the instances a little different, e.g. strict IN vs IPP. Is the spacing per segment just not implemented or is there another reason?

fedorov commented 3 years ago

I think this ticket should have been closed.

But to your question, I have never encountered or had a need to create SEG objects that reference multiple series. Note that when an instance is referenced in the per-frame FG for a segment, it does not mean that the segment should have the same resolution or dimensions as the referenced frame. It might just be, for example, that the operator was evaluating both series at the same time when segmenting something.

Then we tried multiple single frame segments in one object and had several gb of ram usage for 13 frames, this seems to be because dcmqi computes the volume extent of the whole object instead of computing it per segment.

This was probably because by default dcmqi encodes empty frames of a segment, and not just those that have non-zero pixels. If you run it with the -skip flag, it will be much faster, and the resulting object will be much smaller.

Also in ConverterBase::computeVolumeExtent there is a comment stating that all segments must have the same number of frames.

I admit I do not recall what is the origin of this comment from Christian in https://github.com/QIICR/dcmqi/blame/master/include/dcmqi/ConverterBase.h#L101-L106. You are correct, there is no such requirement.

Why would you want to have varying spacing for individual segments? Why not save segmentations that have geometry corresponding to different series as separate SEG objects?

I do not know what is the capability of the standard, but my perspective is that this needs to be balanced with trying not to overload the complexity of the resulting object, since this will result in problems with interoperability. I would prefer to keep things simple rather than trying to put as much as possible into the minimum number of series.

lorteddie commented 3 years ago

The seg object in question only listed a single frame per segment, so no empty frames. ImageSEGConverter::dcmSegmentation2itkimage still made 13 itk images the size of 512/512/1435 which is the distance between the slices of segment 1 and segment 13. I think the volume extent should be calculated per segment and not for all segments at once.

I do agree, simpler objects make everything easier. Problem is mostly that the standard allows the weirdest combinations of segments and instances. And since we also built an importer for seg objects we kind of have to expect everything there might be in the world (we have quite some experience with stupid incorrect dicom data ...). It would be easier to tell people 'this object is not dicom conform' but since practically everything is we need a better excuse or a ridiculous implementation.

fedorov commented 3 years ago

The seg object in question only listed a single frame per segment, so no empty frames. ImageSEGConverter::dcmSegmentation2itkimage still made 13 itk images the size of 512/512/1435 which is the distance between the slices of segment 1 and segment 13.

I misunderstood your earlier comment. I thought you were talking about creating SEG.

There is nothing in the standard that prescribes how to convert SEG into alternative segmentation representations. Yes, it is a choice of dcmqi implementation that ITK volumes resulting from converting a multi-segment SEG will have the same geometry. It never came up in the past, but it would be completely fine to save individual segments as completely independent, non-overlapping volumes, as a conversion option.

I have to say I am struggling to follow, because you said "Considering that, it is quite hard to understand why a seg object has a specific x/y dimension for all frames, since multiple series are likely to have different dimensions. We work around this by grouping measurements which are supposed to be in one object by x/y dimensions possibly creating multiple objects. How is this supposed to work? Rescaling seems very wrong."_ That seems to imply you, not someone else, make the choice to group segmentations across different series into a single SEG.

Yes, I completely agree that it is possible to come up with ridiculous combinations, and it is difficult to interpret the standard. But I thought in your project you are dealing with more or less well-defined process for generating segmentations. I would start with addressing well-defined configurations, and deal with unusual ones as they appear, but I understand this may not be an option in commercial tools. Can we maybe come up with a list of items that are unclear, and try to clarify in the standard with @dclunie help?

lorteddie commented 3 years ago

Yes, we wanted to group segmentations semantically, e.g. anatomical vs pathological, since this is the use case in the current project. But this will not work with SEG because we cannot control on which series these segmentations are done. Chances are quite high people segment their stuff on different series with different spacings and even dimensions, at worst with different FoR.

For now my solution would be to reject all multi series and multi spacing SEG objects. Maybe later I can have a look at how dcmqi converts segments into ITK images. I really think an image per segment is the way to go. Maybe it needs to be configurable via parameter. I guess the current implementation is related to the NOTE in http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.2.3.html basically saying you can create multi-class images from SEG objects.

rfloca commented 2 years ago

Current approach applies ceil() to get the number of slices after dividing the distance. This can potentially lead to a situation where the resulting volume has 1 extra slice, when the distance is slightly above (SpacingBetweenSlices may not be exactly consistent, or this can happen due to machine precision issues). We should probably replace ceil() with round().

@fedorov Hi Andrey. I would like to bump the priority of your thoughts in the task description. We currently run in this problem quite often where the loading of a segmentation generates an extra empty slice at the end (which breaks several workflows where we check for equal geometry :( ). In our case the reason always boils down to the following:

When writing the seg with DCMQI, the image position of each plane position is stored with rounding and a precision dependent on the number of digits. E.G. one slice has-249.511719,-420.511719,-54.3000**488**, another has -249.511719,-420.511719,-1024.3000**5**. Important are the last digits of the z position. Because the later position need more digits to encode the no fractional part, it rounds "more" then the first position.
When reading the data back in with DCMQI, the extend of the volume is calculated as X.0000012 instead of just X, due to the different rounding.
Due to the ceil, you already have mentioned, the slice count of 209,00000024 is bloated to 210, which then leads to an extra slice.

I personally would argue to at least use as round instead of a ceil. 1) This should sort out als rounding error noise. 2) it is not to far from current behavior. If (for whatever reason) the rest of the extend is larger than half a slice thickness, it earns his extra slice ;).

But I am open for other ideas. What are your thoughts?

It would be really great if we could fix this upstream.

Thanks.

fedorov commented 2 years ago

@rfloca thanks for bumping the priority. I admit I am not doing any active development on dcmqi - it "just works" at this point, and I just don't have the cycles to do anything major with it. I would say the first priority will be to upgrade the outdated version of ITK that is used in the CI platforms, which no longer compiles on a linux ... :-(

I do not have concerns about your suggestions. My overall attitude is that the recipient of the conversion results should be robust to the differences in geometry. When empty slices are not encoded, which is a completely valid situation, geometry will be different from that of the image, and it would be appropriate to resample.

Let me make a PR replacing ceil with round.

rfloca commented 2 years ago

@fedorov Thank you for taking care and even making the pull request directly. didn't mean to push you to do that! Would have done it myself, I just wanted to reassure that the solution still sense for you.

My overall attitude is that the recipient of the conversion results should be robust to the differences in geometry. When empty slices are not encoded, which is a completely valid situation, geometry will be different from that of the image, and it would be appropriate to resample.

I mostly agree. But in this case it is making up a slice that was never there and for that reason e.g. breaks some validation logic on our side as segmentation volumes become bigger after reloading. As there is a a sound technical solution to improve the fidelity of DCMQI, I would prefer this over "always resample/crop".

fedorov commented 2 years ago

Fixing CircleCI build that was failing since python 3.5 it was using reached "end of life" turned out to be more difficult, but I think I figured it out. Tests are passing on linux and windows, so I will merge, and see if I can fix TravisCI later.

fedorov commented 2 years ago

@rfloca I now fixed the TrvisCI integration as well, tests are passing on all platforms.

Hopefully this addresses your needs!