Proposal: Defining and Prototyping "Labelmap" Segmentations in DICOM Format

NA-MIC / ProjectWeek

Website for NA-MIC Project Weeks

https://projectweek.na-mic.org

84 stars 284 forks source link

Proposal: Defining and Prototyping "Labelmap" Segmentations in DICOM Format #643

Closed CPBridge closed 1 year ago

CPBridge commented 1 year ago

Project Description

The DICOM Segmentation format is used to store image segmentations in DICOM format. Using DICOM Segmentations, which use the DICOM information model and can be communicated over DICOM interfaces, has many advantages when it comes to deploying automated segmentation algorithms in practice. However, DICOM Segmentations are criticized for being inefficient, both in terms of their storage utilization and in terms of the speed at which they can be read and written. This is in comparison to other widely-used segmentation formats within the medical imaging community such as NifTi and NRRD.

While improvements in tooling may alleviate this to some extent, there appears to be an emerging consensus that changes to the standard are also necessary to allow DICOM Segmentations to compete with other formats. One of the major reasons for poor performance is that in segmentation images containing multiple segments (sometimes referred to as "classes"), each segment must be stored as an independent set of binary frames. This is in contrast to formats like NifTi and NRRD that store "labelmap" style arrays in which a pixel's value represents its segment membership and thus many (non-overlapping) segments can be stored in the same array. While the DICOM Segmentation has the advantage that it allows for overlapping segments, in my experience the overwhelming majority of segmentations consists of non-overlapping segments, and thus this representation is very inefficient when there are a large number of segments.

The goal of this project is to gather a team of relevant experts to formulate changes to the standard to address some issues with DICOM Segmentation. I propose to focus primarily on "labelmap" style segmentations, but I am open to other suggestions for focus.

The specific goals would be to complete or make significant progress on the following:

Formulate changes to the standard to allow for labelmap segmentations (@dclunie)
Complete prototype implementations within the highdicom library (of which I am a maintainer), dcmjs (@pieper) and possibly dcmqi (@fedorov )
Create example datasets for dissemination to others wishing to implement the changes
Begin the process of reaching out to others in the open source community to accelerate other implementations, particularly viewers such as slicer (@pieper ) and OHIF

Open questions:

Should we implement a new IOD or a new SegmentationType within the existing Segmentation IOD?
Should we implement "instance" segmentations, in which each segment is assumed to be a different instances of the same type, and thus need not be described separately, in addition to label-map style semantic segmentations?
Should we also allow 16 bit pixels to allow for more segments? How does this interact with the choice of new IOD vs new SegmentationType?

Other possible (alternative) topics:

Single bit compression to allow for more space-efficient storage
Omitting the per-frame functional group (like TILED FULL) for other types of segmentation image.
The inefficiency of pydicom in parsing long sequences, such as the per-frames functional groups sequence in segmentations, is a key bottleneck in Python. We could think through how to overcome this

Relevant team members: @fedorov @dclunie @pieper (@hackermd ) please give your feedback to help shape this project!

wayfarer3130 commented 1 year ago

It would be interesting to see this displayed in a viewer such as OHIF - the loading on this shouldn't be too different from the existing SEG loader, and would give a useful comparison for performance purposes.

CPBridge commented 1 year ago

It would be interesting to see this displayed in a viewer such as OHIF - the loading on this shouldn't be too different from the existing SEG loader, and would give a useful comparison for performance purposes.

It would be fantastic to have someone from OHIF involved!

wayfarer3130 commented 1 year ago

Having a transparent conversion between old and new SEG objects would be really nice, but some of the interesting aspects might require a custom IOD to allow for more tag values. Still, there is precedent for how to implement such a conversion in the enhanced legacy multiframes - although I don't know anyone who uses that.

If the representation was a multiframe object, with one or more color LUT tables, and also a pixel value to set of labels, then it becomes possible to define an overlapping segmentation - the algorithm being to just assign the next instance number whenever a new combination of labels is assigned to a pixel. That algorithm also allows defining two labels for a given region - the "edge" label and the center label, which can be used to nicely show the outline.

wayfarer3130 commented 1 year ago

It would be interesting to compare HTJ2K, JPEG-LS, RLE and compressed TSUID's for this representation at various numbers of label maps/overlaps. David Clunie at the last compression WG-04 meeting stated that the compressed TSUID performed better than JPEG 2000 in terms of size for single bit segmentations, which isn't surprising given how sparse they are. I would hope that the representation would be quite a bit better for this as it would be much less sparse. The efficiency gains in not needing to handle so many images might be significant as well (comparing to single bit), but the straight implementation of being able to just overlay an image with transparencies without needing to look at pixel values is enormous - that is, most image display systems can just be told to draw a given image with a given LUT, and will do that efficiently, so with the representation above, that directly includes color LUT table(s). To do that, the color LUT should include a transparency channel.

wayfarer3130 commented 1 year ago

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint. This really requires making sure the segmentation has some sort of default viewable representation, and preferably one that is easy for developers to implement without thinking too much (that helps make implementations consistent). My suggestion is to define two mechanisms for it:

Add a segmentation reference, basically the same as how the GSPS representation. That allows fetching a single image, or a particular series with a given segmentation applied.
Fetch the rendered representation of the segmentation. This would render the referenced images with the segmentation applied, as a multiframe image of some sort.

Eventually, adding this to the proposed DICOMweb 3d would allow for returning rendered 3d representations.

sedghi commented 1 year ago

Great to see the discussion is happening! putting my benchmarks here https://docs.google.com/document/d/14tgwQKfjbpxnaXXeH1AEzunWRnYeMkcBNNS1efCf0Jo/edit?usp=sharing

For the questions

16 bit is inevitable with full body SEG tools that we are seeing emerging everyday
some flavor of RLE would be very much suited based on my experience

pieper commented 1 year ago

Thanks everyone for participating in this effort. Efficiency of SEG is essential. It's been good to use today's use cases for benchmarking (such as brain or body segmentations with around a 100 segments), but given the way segmentation tools are improving, it's clear to me that soon we will be looking to encode thousands of segments, if not more, and any work we put in now should be able to handle these use cases. Good compression and efficient representation of metadata are going to be essential.

I'm happy to work on this in both Slicer and dcmjs.

fedorov commented 1 year ago

Is refining support of the existing SEG, as defined in the standard now, in scope for this project? I am very supportive of the efforts to develop new representations, but we should not forget about the existing datasets and implementations of the current standard in the existing tools. Also, it will take time to develop the proposal, get it into the standard and gain acceptance. Together with @igoroctaviano I have been looking at the benchmarking of the OHIF v2 and v3 implementations, and will have those available along with samples in case this can help.

Regarding prototyping of the implementation, dcmqi is leveraging IOD implementation in DCMTK, and I don't think I will be able to prototype that.

Pinging Michael Onken @michaelonken for awareness.

CPBridge commented 1 year ago

Thanks everyone for the feedback and participation!

Having a transparent conversion between old and new SEG objects would be really nice

This is a really good point

If the representation was a multiframe object, with one or more color LUT tables, and also a pixel value to set of labels, then it becomes possible to define an overlapping segmentation - the algorithm being to just assign the next instance number whenever a new combination of labels is assigned to a pixel. That algorithm also allows defining two labels for a given region - the "edge" label and the center label, which can be used to nicely show the outline.

I'm not really following this. Perhaps you could clarify. The current segmentation IOD does allow overlapping segments. I have been thinking along the lines that if people want to use overalapping segments they would continue to use the existing segmentation IOD, and we would simply define a new "special case" to make the case of non-overlapping segments more efficient.

Compression is definitely important, though I personally put this second to having a labelmap style encoding.

Thanks @sedghi for those benchmarks, very useful!

but given the way segmentation tools are improving, it's clear to me that soon we will be looking to encode thousands of segments, if not more, and any work we put in now should be able to handle these use cases

I agree!

Is refining support of the existing SEG, as defined in the standard now, in scope for this project? I am very supportive of the efforts to develop new representations, but we should not forget about the existing datasets and implementations of the current standard in the existing tools.

I completely agree that improving tooling for the existing segmentations is important. I have spent quite a bit of time recently improving the efficiency of both encoding and decoding in highdicom, and plan to do more. However, I feel that that may be something best left to individual developers to do in their own time, and a better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations (as it seems to me that we have reached a consensus is necessary). What do you think @fedorov ?

CPBridge commented 1 year ago

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint.

This sounds like potentially a good idea, but not one that I am best placed to execute on. Are there particular considerations for the design of the actual IOD that will make this easier, that we should bear in mind? It seems to me that this could be a separate proposal without interdepencies with the other things that we are discussing here, but maybe I am wrong.

Generally speaking I am of the opinion that the Segmentation object should simply encode the segmentations and their semantics, and viewers are free to choose how to render them, perhaps with reference to a presentation state if desired. But then again, I don't write any viewers :)

I definitely want to make sure we don't do anything that makes viewers' lives harder

fedorov commented 1 year ago

However, I feel that that may be something best left to individual developers to do in their own time, and a better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations (as it seems to me that we have reached a consensus is necessary).

That's a fair point - makes sense to focus this project on the proposal development.

lassoan commented 1 year ago

Overlapping segments

In Slicer we have gone through a number of different representations of overlapping labels. The best solution, clearly, by far, is multiple 3D labelmaps (we stored them in a 4D array but multiple 3D arrays are fine, too). For non-overlapping labels (typical for atlases and AI segmentation results with hundreds of segments) it is as fast, simple, and memory-efficient as simple labelmaps. Segments typically overlap in small groups, for example, tumor or vasculature can be specified over solid organs - without overlap within the group (e.g., all vessel labels can be stored in a single labelmap). This is confirmed to work really well, for over several years now, over a very large number of Slicer-based projects, so I'm confident that this representation can fulfill all voxel-based segmentation storage needs.

DICOM standardization

The current DICOM standardization practice is that:

Each vendor first implement their solutions independently using private DICOM tags (often in a way that third-parties cannot decypher them)
Fight it out in long, relatively infrequent, formal meetings of DICOM working groups what the common standard will be. When the standard is finalized often there is no working implementation and definitely not a significant amount of experience with how the new data object works in practice.
Vendors implement the standard. During implementation it may turn out that it is too complicated, too slow, ambigous, etc.
Amendements, modifications are attempted, but since the standard is already out, in use, the possibilities are limited and any change is very expensive.

I would propose to change this process by replacing this with an open, iterative, code-first (code is the specification) approach:

Developers from multiple medical application development groups (companies, research groups, commercial and open-source developers) agree in a DICOM data structure at a high level.
Implement a library for reading/writing the data structure.
All groups use this library in their application to get real-life experience with it.
Keep iteratively improving the library based on the feedback.
After 6-12 months of real-world usage, start the formal DICOM standardization process as it is done today.

We can follow the usual github development process, discussing things in issues and proposing changes through pull request, etc. The project week could be a good candidate to try one iteration of this new approach!

lassoan commented 1 year ago

better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

We already have a lot of code that can be reused, so we probably don't need to implement a lot from scratch.

fedorov commented 1 year ago

The current DICOM standardization practice is that:

@lassoan I agree with you in general, but it is also, at least sometimes, in my experience, the case than given the opportunity, repeated reminders and invitations to participate in the standard development, vendors (for a variety of reasons, I am sure) are not committing resources to test the proposals and provide feedback. And by the time they express interest, it is too late to change the standard. I think often there are no incentives for vendors to commit resources to develop the standard. It takes huge effort to recruit vendor participation.

But I completely agree that we should follow the approach you are proposing in this project. Would be great to have developers of commercial tools that at least touched DICOM SEG in the past to participate, but I am not very optimistic this will be feasible. Here are the companies/groups that have, or had products, that support/supported/attempted to support DICOM SEG (that was in 2018): https://dicom4qi.readthedocs.io/en/latest/results/seg/. Add to this Sectra, Kaapana, there may be more.

There is also the balance between agility and inclusivity of the process. Since the more voices you have, the more difficult will be to reach consensus. Maybe after trying rallying various groups around this activity, it will be easier to empathize with the challenges of developing DICOM and shepherding DICOM working groups.

CPBridge commented 1 year ago

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

Not sure that I'd go quite this far, but certainly would want to have prototype implementations from an early stage, as many problems are only discovered by implementing them.

We already have a lot of code that can be reused, so we probably don't need to implement a lot from scratch.

Would this be slicer code? In core slicer or elsewhere?

lassoan commented 1 year ago

@fedorov I agree, you raise many good questions. The new approach would not solve all problems, but would have significant advantages.

@CPBridge

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

Not sure that I'd go quite this far, but certainly would want to have prototype implementations from an early stage, as many problems are only discovered by implementing them.

It makes sense to spend some time with discussion and documentation, because maybe we don't have 100% common understanding of how things should work. However, we should try to keep that at the minimum, because we should already know enough to be able to implement in a few days a DICOM-based solution that is almost as capable but much simpler and several magnitudes faster than the current standard. This implementation would be a much better basis of discussion and further developments than any document.

Would this be slicer code? In core slicer or elsewhere?

Yes, in Slicer we have implemention for storing segmentation as 4D labelmap, fractional labelmap, closed surface, planar contour, ribbon, representations, including metadata that is needed for creating the DICOM representation; storage in NRRD and a few other research file formats; and conversions of these representations to/from current DICOM objects (DICOM Segmentation object, RT structure set, fake CT), mostly in C++, with some plumbing in Python. It should not be hard to reorganize this code to store segmentation in an existing DICOM information object with some private fields.

I'm sure you have some code to build on, too.

It would make sense to start with a reference implementation, probably in Python (pydicom with ITK or with just numpy) during the project week. Later on we could add C++ and JS implementations.

CPBridge commented 1 year ago

I intend to write a reference implementation by building on the current implementation of the Segmentation IOD in highdicom, since I know that code very well and prototyping should be pretty fast. Other developers associated with other projects are very welcome to join in. It sounds like @pieper at least will be doing something similar in dcmjs, and perhaps slicer.

@lassoan I am not sure I really understand what you mean by a "4D labelmap", could you clarify? To me, a segmentation array would be either 4D (3 spatial dimensions + 1 segment dimension) or a 3D "labelmap" (3 spatial dimensions with pixel value encoding segment membership), but not both at the same time. Is this because you have multiple groups of non-overlapping segments, with each group stacked down the 4th dimension and within each group, segment membership encoded by pixel value. If so, I see that this may be more space efficient, but worry that it would do little to help the criticism that SEGs are complex and hard to parse.

pieper commented 1 year ago

It sounds like @pieper at least will be doing something similar in dcmjs, and perhaps slicer.

Yes, I'm willing to look at both dcmjs (and how it's used in OHIF) and Slicer. For Slicer I think we can explore using highdicom directly since from what @fedorov said it may be hard to extend the current implementation with dcmqi / dcmtk.

fedorov commented 1 year ago

For Slicer I think we can explore using highdicom directly

Yes, I discussed this with Chris when we talked about this project yesterday. It makes a lot of sense to have a highdicom-based plugin to read SEG.

lassoan commented 1 year ago

I am not sure I really understand what you mean by a "4D labelmap", could you clarify?

The dimensions are I, J, K, layer. Each "layer" is a 3D labelmap that stores a number of non-overlapping segments.

worry that it would do little to help the criticism that SEGs are complex and hard to parse

The beauty of this scheme is that it is very simple, yet it fulfills the requirements of many use cases. If segments do not overlap then there is only one layer, so we are fully backward compatible with all the usual 3D segmentation files. If segments overlap then we store the data in a few 3D arrays ("layers") instead of one 3D array. We very rarely need more than a few of these layers, so the rendering performance is good and it is independent from the number of segments. You can extract voxels of a segment using simple numpy indexing.

It makes a lot of sense to have a highdicom-based plugin to read SEG.

Agreed. We can create a high DICOM-based DICOM Segmentation object importer exporter in Slicer during the project week.

CPBridge commented 1 year ago

Would it be so bad if you simply stored each non-overlapping set ("layer") of segments as its own segmentation instance? My instinct is to try and make the new proposal as simple as possible

lassoan commented 1 year ago

Storing a segmentation in a single series but storing each layer in a separate instance (file) could slightly simplify the DICOM specification and implementation of DICOM toolkits. However, it would significantly complicate implementation at the application level:

Metadata and image geometry (origin, spacing, axis directions, extents) could be inconsistent between instances. The application would need check for inconsistencies (increasing complexity and processing time) and resolve any detected inconsistencies. Applications could simply reject loading an entire series if any inconsistencies are found, but that would make everything very rigid and fragile (applications that are not prepared for dealing with time sequences or multi-resolution could not read segmentations at all. Applications could resample the labelmaps into a common coordinate frame, but to which, and how much data loss would be acceptable during the process? Should the user be warned? But what if there is no user interface? Should such series be completely rejected or partially loaded? Different applications would choose different solutions and so behavior would be inconsistent and unpredictable. It would be a mess.
It would be much harder to determine how many instances make up the segmentation and ensure that the data is completely read/written/transferred. Similar problems are there for image instances as well, but images are just acquired once, while segmentations can be added anytime.
We already need the sub-series grouping for storing segmentations at multiple time points and multiple resolutions. It would be nice to avoid the extra level of grouping based on layers. Overlap between segments can change between time steps and resolutions, which could make cross-references between time points very complex (you would not be able to use isntance UID to store correspondence between segmentations across time points and resolutions, but you would need to introduce some new sub-series level UIDs).
Existing DICOM representations (SEG, RTSTRUCT) store multiple overlapping segments in a single instance, too, and changing this could hinder adoption of the new representation.

On the other hand, allowing storage of a 4D array instead of just 3D would barely make any difference in the complexity of the standard or DICOM toolkits. We would need to store the number of layers (one extra field) and for each segment store not just the label value but also the layer index (one extra field per segmentation).

pieper commented 1 year ago

I just saw the response from @lassoan which came in while I was writing the message below. He has many good points that are somewhat different from mine, but we both agree.

simply stored each non-overlapping set ("layer") of segments as its own segmentation instance?

@dclunie suggested the same idea of having one instance per layer, but I'd prefer to push "one file per segmentation" to be really be on par with nrrd or nifti in the minds of users. Since the concept of frames is central to SEG, I don't see why we can't put multiple layers of labelmaps into the same multiframe instance. Is there a technical problem you foresee @CPBridge?

My main argument would be that the legacy of needing multiple files as part of the same conceptual "unit" has been a valid objection to many dicom scenarios (think how much people hate one file per slice of a volume, or how confusing it is for tiff users to have one file per layer of a WSI pyramid). I know multiframes have been slow to catch on, but I understand the new Siemens MR will start generating them by default and I'm guessing people will like them once they are used to them.

On a similar note, I think we should adopt a strict convention of using .seg.dcm to name any files of SEG instances (not part of the standard, but convention across our tools). If we can make the argument that .seg.dcm is a drop in replacement for, but better than, .nii.gz or .nrrd we will have a much better chance of adoption.

CPBridge commented 1 year ago

I don't see there is a technical reason why this couldn't be done. But my initial reaction is reluctance for a few broad reasons:

I have a sense that adoption of the current seg is poor at least in part because it is perceived as over-engineered and overly flexible and therefore difficult to implement, and generally hope that whatever we propose simplifies it rather than complicates it. At the same time we'd ideally make a few surgical tweaks to simplify things and improve efficiency while leaving as much unchanged as we can such that implementations that need to deal with both new and old segs are as simple as possible. Therefore I have a baseline high threshold for importance of further features to include. The proposed "labelmap layers" approach would presumably involve at least one more dimension index (when the frame indexing is already the most confusing part of segmentations), and some sort of sequence describing what each layer represents. Then we would have to think through what to do with the SegmentSequence, which describes the semantics and descriptive metadata associated with each segment, with the (1 based) index of the segment in this sequence matching its segment number. Without layers, we could leave this completely unchanged and just use the segment number in the pixel array to indicate the presence of that segment, a very minimal change. If we do introduce layers, we have to take a different approach. Either segments are numbered within a layer, which presumably entails significant changes to the SegmentSequence, or segments are still numbered "globally" (at the level of a segmentation instance), but now we have some rather unpleasant-sounding mapping of within-layer pixel value to global segment number. If we lose "global" segment numbers, then there are knock-on effect outside the Segmentation IOD, since all the places that a segment may be referenced (such as the ReferencedSegment in structured reports such as TID1410 and TID1411) need to be changed and we have to think through all of those too! I definitely want to avoid this. If we introduce labelmaps but not layers, I think we have none of these problems.
After working in the field for a while (though certainly not as long as others on this thread :) ) on a broad variety of applications, I have never worked on one application where this would have helped. It is worth emphasising that it is and will remain possible to store overlapping segments within one segmentation using the existing segmentation IOD. So the only situation in which the "labelmap layers" will help is where there are a large number of overlapping segments that can be split into a small of number of sub-groups of segments where there is no overlap between segments within each subgroup. In this situation, you could save some frames by storing as a layered labelmap vs distinct frames for each segments. This to me feels quite niche... But perhaps there are things I am not aware of. As an aside, can .nii do multiple layers of labelmaps within one file? I was unaware that it could, but I am not an expert on nifti.
While I am sensitive to the concerns about the number of checks that a receiving application would need to perform when dealing with two segmentation instances (@lassoan 's point), I do not think that this is as big a difference as it would first appear, because:
- Even within a single segmentation, the indexing of frames is so flexible that you still as a receiving application need to do various checks on plane position etc. It is entirely possible to have many irregularly spaced planes at different orientations within a single segmentation instance. So splitting into two segmentation instances doesn't make this much worse.
- There is already a mechanism to convey that frames are organised the same way within two instances: the Dimension Organization UID.

When different SOP Instances share the same Dimension Organization UID (0020,9164) for a particular Item of the Dimension Index Sequence (0020,9222), equivalent indices from the corresponding Dimension Index Values (0020,9157) shall have the same meaning across the SOP Instances. This mechanism allows an image creator to explicitly specify that indices are intended to convey identical information across SOP Instances.

Therefore, as long as creating software correctly implements this, all you would need to do is check that two instances have the same dimension organization UID, and then all further checks are no longer need: you can use the dimension indices equivalently across the instances. Admittedly, I have yet to see this mechanism actually used. I have been meaning to include it in highdicom for a while (such that if you generate a segmentation of an image with a dimension organization UID, it would copy the UID and use the same index values), but it's never been a top priority. Still, I'd much rather implement existing mechanisms than create new ones.

CPBridge commented 1 year ago

Perhaps I should be clear about one thing. My intention with the labelmap segmentation is not to entirely replace the existing Segmentation IOD, but rather to either generalize it or create a new IOD that better deals with the (overwhelmingly common) "special case" where there are multiple segments that do not overlap. I very much see it existing within a world where the current Segmentation is still used.

Whether this is a good scope is up for debate but that's my starting point.

CPBridge commented 1 year ago

On further thought, perhaps the "layered labelmap" wouldn't be so bad as long as we keep global segment numbers and keep each pixel value unique to a single segment, regardless of which layer it is in. If we are broadening to 16bit anyway this wouldn't be too restrictive and in the case of a single layer would not introduce significant complexity

pieper commented 1 year ago

It's interesting to hear the various priorities that we will want to juggle. So far I see:

Making SEG really effective for use cases requiring both efficiency and rich standardized semantics
Making minimal changes to the standard (not just SEG, but also SR)
Making life easier for developers
Minimizing confusion for users

I think these are all valid, but to me the overriding priority is to make a standard that is useful enough and has no killer shortcomings so that it can be realistically used by projects like Slicer, OHIF, MONAI, Slim, etc. If SEG is only useful as some kind of archiving format but not good for day-to-day use then we may as well just use something else and not bother changing dicom at all.

To me the biggest current shortcoming is actually the size of the instances (200x larger than existing options) and I'd like to see this addressed with some kind of scalable compression (i.e. something other than gzipping the whole file). This is only going to get worse as the number of segments increases, but it's also perhaps the easiest thing to fix from a standards point of view. I'm not currently sure if the second highest priority is inefficient metadata or the lack of labelmaps.

As an aside, can .nii do multiple layers of labelmaps within one file? I was unaware that it could, but I am not an expert on nifti.

Well, probably you could write out such a creature, since there are 4D nifti files and all kinds of header extensions and json sidecars available, but I haven't seen that exact variant. On the other hand, a .seg.nrrd has the layers of labelmaps concept and is in common use for Slicer segmentations.

CPBridge commented 1 year ago

I think these are all valid, but to me the overriding priority is to make a standard that is useful enough and has no killer shortcomings so that it can be realistically used by projects like Slicer, OHIF, MONAI, Slim, etc

I agree, I think we want to minimize changes to the standard and complexity, subject to achieving "usable" segmentations (otherwise it's pointless). Whether multiple layers is a required feature to achieve "usable" is what we were discussing above (it wasn't initially on my mind as such but I could be persuaded).

To me the biggest current shortcoming is actually the size of the instances (200x larger than existing options).

I agree with this too but would also throw in time to read, write, and extract relevant information as similarly important. These are correlated but not the same.

and I'd like to see this addressed with some kind of scalable compression ... I'm not currently sure if the second highest priority is inefficient metadata or the lack of labelmaps.

Interesting. Here we diverge a bit. I put the highest priority on labelmaps, both overall and as the focus for project week. The reason is that labelmaps actually represent a sneaky 3-for-1 deal. First, labelmaps allow you to store N times fewer planes if you have N segments, thus saving storage and speeding up read times. Second, N times fewer planes means N times fewer per frame functional groups, which helps the metadata problem a bit. Third, and more subtly, labelmaps essentially allow us to sidestep the compression issue. The main problem right now with compression is a lack of options for compression of images that use BitsStored = 1, which is required for segmentations with SegmentationType of "BINARY". However, if we introduced labelmap segmentations, either as a new SegmentationType or a new IOD, then we would necessarily have to store them using BitsStored=8 (and/or hopefully BitsStored=16). In that case, we would get "for free" the full range of options already available for (lossless) compression of 8/16 bit images, including JPEG-LS, JPEG2000, and Run Length Encoding, which are implemented already in most toolkits. Note that this is actually already possible with Segmentations with SegmentationType FRACTIONAL, which Markus Herrmann and I have been using for a while to reduce the size of our SEGs significantly (see this discussion for some benchmarks on WSIs showing that this approach gives over 5x better compression). However this is semantically questionable for sure when the segmentations are not actually fractional...

Also, compression comes at a performance cost, which can be large for some of these lossless methods at encode time. Therefore I worry about implementing compression but leaving the number of frames large as this could make creation times ridiculous. For example, when writing this pull request I used @pieper 's total segmentator output from here with over 600 CT frames and 100 segments as a benchmark. Because the segmentations are fairly sparse, when omitting empty frames this gives around 12000 frames to store (still a large number). I tried storing this as a FRACTIONAL segmentation (as noted above is an abuse of semantics) using JPEG2000 lossless compression (I probably should have tried others too). I was getting around 0.05s to compress each frame, which means about 10 minutes to create the segmentation spent just compressing the frames. Now, I should and will parallelize the frame compression part of highdicom's implementation. But still... If we imagine that we first solved the labelmap problem, this would be around 600 frames to store, and assuming the extra classes did not significantly affect compression time, this would mean creation would take around 30secs single-threaded. Still a long time but definitely less absurd.

I do think that we should also put effort into fixing single bit compression to give more space efficient BINARY segmentations. Here are a few related thoughts:

It is actually possible now, according to @dclunie here, to use JPEG2000 compression with single bit images. There is a change proposal to clarify that this is possible. However, I cannot find any toolkits in python that actually support encoding single bit JPEG2000 images. I am less sure about decoding, because I haven't managed to create any to decode! Pillow and pylibjpeg-openjpeg do not support this even though the underlying OpenJPEG does. Someone would probably have to write some C code to get this functionality wrapped and available in python. I realize that this is a Python-focused view but I obviously want to focus on where I can make progress.
We should explore the possibility of allowing BINARY segmentations to be stored with BitsAllocated=8, BitsStored=1, and then using existing 8 bit compression methods. This would be a change to the standard but only a few line tweak in highdicom to get working. This is something that only occurred to me recently and I haven't discussed the implications with @dclunie (this may be unacceptable from a backwards-compatibility point of view for existing segmentations).
Changes to compression methods should I think be relatively self-contained and I want to use the time together at project week to think through the labelmaps as I think this is the piece that requires the most thought and a broad combination of expertise and viewpoints to get right. But that is just my feeling and I am happy to shift the focus to include compression, or perhaps focus entirely on it if that is what the group thinks is more important.

pieper commented 1 year ago

Good points, @CPBridge. If labelmaps give us both compression and less metadata that could be enough. Being able to do per-frame compression of labelmaps would be a big win over nrrd and nifti, since they are both gzipping the full volume, which is hard to parallelize and you can't seek.

CPBridge commented 1 year ago

Correction: I was using JPEG2000 lossless compression for the above experiment (I had previously said JPEG-LS, and have edited my post above to correct it). JPEG-LS is both very very significantly faster and more effective at compressing that particular CT segmentation, at least in highdicom as currently implemented. (The BINARY bitpacked segmentation is 366MB, the JPEG2000 compressed FRACTIONAL Seg is 31MB, the JPEG-LS compressed FRACTIONAL seg is 8MB, which could presumably be reduced further with a labelmap). This also suggests that using JPEG-2000 for single bit compression is likely to be far from the best solution. If I have time I will try to run a few more benchmarks before the project week to inform the best next steps.

Nevertheless, the point sill stand that compression is a significant time cost and therefore it makes sense to try to reduce the frame count as well as compress frames.

pieper commented 1 year ago

Thanks for testing @CPBridge. For reference the gzipped labelmap version of that data in nii.gz format is 2.3MB, so I think your intuition is correct that labelmap would be better. I'd guess that image-oriented compressors would do even better than gzip on the labelmaps.

sedghi commented 1 year ago

tagging more people on this @JamesAPetts @chafey

wayfarer3130 commented 1 year ago

A labelmap with a single label is equivalent to a binary segmentation, and can be stored in 8 or 16 bits, so would allow for JPEG-LS storage. It should be pretty efficient in JPEG-LS as suggested above.

wayfarer3130 commented 1 year ago

What about sparse labelmaps for whole slide imaging? Any changes to this standard should also allow optimized storage of fairly small labelmaps for whole slide imaging. For this case, one might decide to implement a degenerate solution, storing a fairly large rows/columns image, but allowing it to be offset anywhere, then storing an encoded representation that has fewer rows and columns than the specified. According to DICOM part 5, this will take precedence over the DICOM specified rows and columns, and so it becomes possible to store just a small region. That is, one would have something like:

Rows/Columns: 8192 (8k) Images Position X,Y, focal distance Maybe actual rows/columns Image - reduced size to actually fit required area, stored in something like JPEG-LS

CPBridge commented 1 year ago

A labelmap with a single label is equivalent to a binary segmentation, and can be stored in 8 or 16 bits, so would allow for JPEG-LS storage. It should be pretty efficient in JPEG-LS as suggested above.

Yes indeed, one could store a single binary segment as a labelmap to get the benefits of 8 bit compression under the proposed changes. A remaining issue would be that multiple overlapping segments would need to be stored with the existing binary representation, so there is still a motivation to get some sort of effective compression working for the existing seg iod and BINARY SegmentationType.

What about sparse labelmaps for whole slide imaging?

This is already possible as far as I am concerned. I have recently produced many segmentations of WSIs in tiled format and I omit any tile where the segmentation is empty. Alternatively, one can specify arbitrary plane positions and orientation to store a seg at an arbitrary offset/rotation/zoom with respect to the source image. Highdicom has (admittedly low level) support for this.

sedghi commented 1 year ago

There were some discussion above about having each layer in a separate instance. Are we talking about one file and muliple instances (multiframe) or separate SOPInstanceUIDs? multiframe makes a lot of sense. However, in writing the SEG we are assuming the library is taking care of separating the instances to as minimum as possible for them to not have overlap. And what happens if say one SEG that was 2 layers (2 instance), and user edits it and add a new segment which overlaps and it becomes 3 layers. Are you proposing to append the layer so that the order doesn't change? what about remove?

lassoan commented 1 year ago

All these ideas are very interesting, but it would be very hard to get things right if we open up so many questions for debate. Could we aim for making already proven, widely used, well-liked research file formats available in DICOM? Building on existing research file formats would be a safer bet, because we would need to make less decisions (thus less chance for making mistakes), and it would be very easy for the community to adopt the new file format (due to the trivial, lossless conversion between DICOM and the research file format).

For example, I am confident that standardizing the current .seg.nrrd file fomat would fulfill all needs of radiology research applications and we could slighly adjust it (taking over a few things from the current DICOM Segmentation object) to make sure clinical needs are fulfilled, too.

Is there a popular research file format for WSI that is proven to be sufficient for most applications and very widely used?

Since radiology and WSI research file formats has not converged over the years (the OME-Zarr initiative is a good illustration of why), there is a high chance that there will not be a simple, single DICOM format that is optimal for these two completely different applications. But this should be fine, there is no need to force everything into a single information object definition. We already have a separate IOD for surface segmentation and the current image segmentation IOD will not disappear immediately either, so we'll have multiple IODs for segmentations anyway.

CPBridge commented 1 year ago

There were some discussion above about having each layer in a separate instance. Are we talking about one file and muliple instances (multiframe) or separate SOPInstanceUIDs? multiframe makes a lot of sense. However, in writing the SEG we are assuming the library is taking care of separating the instances to as minimum as possible for them to not have overlap.

SEGs are already multiframe. The discussion was about whether we should allow each "layers" to be stacked along another "dimension" in the multiframe instance (I am gradually warming to the idea) or each stored as separate instances (SOPInstanceUID). Who allocates segments to layers is an implementation detail for those who write libraries.

And what happens if say one SEG that was 2 layers (2 instance), and user edits it and add a new segment which overlaps and it becomes 3 layers. Are you proposing to append the layer so that the order doesn't change? what about remove?

Since DICOM objects should be immutable, I do not think we should concern ourselves with editing. I tried to go down this path once and you get into a real mess very quickly because the fact that frames may be omitted etc means that you basically need to re-do the frame organization from scratch with the full information available each time you want to add a segment. Admittedly, this is a weakness of DICOM seg, but in keeping with its primary use as a clinical format.

All these ideas are very interesting, but it would be very hard to get things right if we open up so many questions for debate. Could we aim for making already proven, widely used, well-liked research file formats available in DICOM?

I largely disagree with this direction. Although there is a lot of useful discussion on this thread in various directions, I absolutely believe that with just a couple of minor and quite manageable tweaks to the standard, namely:

New "LABELMAP" segmentation type in addition to BINARY and FRACTIONAL (actually a very straightforward change, we already had one prototype implementation in highdicom a while back)
Enable storing BINARY segmentations with either true single bit compression, or allowing it to use 8 bit compression with BitsStored=1.

we could end up with a format that is very significantly more space efficient than what we currently have (and possibly more efficient than nrrd or nifti due to the use of proper image compression methods) that works well across WSI and radiology, allows for seeking/decompression/web-retrieval of individual frames, and is much more similar to existing image storage meaning that existing systems for dealing with multiframe DICOM (such as viewers and archives) will have a much easier time working with it than if they had to implement support for a net new storage format like nrrd. We just need to get the right people together to agree and get it implemented in a few key places. I think creating a wrapper for nrrd would actually be a lot more work and a lot harder to get right. That would open up even more questions.

lassoan commented 1 year ago

I think creating a wrapper for nrrd would actually be a lot more work and a lot harder to get right. That would open up even more questions.

I meant to adopt the general ideas of successful research file formats (i.e., labelmap and standard compression algorithms) and focus on reproducing their features that are proven to be necessary (don't try to develop new features for now).

I agree that having the LABELMAP segmentation type and selecting some standard compression algorithms would take care of the voxel data in radiology applications. From the discussion above it seemed that there are some open questions for WSI and that's why I suggested we could focus on allowing to do in DICOM that is already commonly done in existing, widely used WSI research file formats.

It would be also nice to review if we can simplify the currently required metadata. For example, it may not worth paying the price for having slice-level references in PerFrameFunctionalGroupsSequence, especially when automatic 3D segmentation methods are used.

fedorov commented 1 year ago

NRRD-like formats are so efficient in part because they are very restricted in expressiveness as compared to DICOM SEG, but this is actually good, since users value good performance and compact representation a lot more than expressiveness. My overall feeling about DICOM is that so often it is designed to make sure that it can address 99% if not 100% of use cases - in principle! - but the price for that is that - in practice! - 80% of applications need to suffer from enormous complexities and inefficiencies, sacrificing adoption for the sake of future-proofing and "utility in principle".

Here are top of the list items that I would like to see revisited:

Single-bit packing: I am sure the intent was good, but I did not see any evidence it helped anyone in practice, instead significantly complicating the implementation, complicating frame access, causing problems for compression. NRRD and likes do not invent any new packing schemes, deferring compression to established algorithms.
Empty frames ambiguity: empty segmentation frames can be (but do not have to be!) skipped, which, again, I believe was a good intent to help reduce the size, but in practice complicates implementation. Maybe there is a benefit to requiring that the revised SEG should have regular sampling of the volume? This might potentially eliminate/reduce the need for per-frame FGs (i.e., they can still be present to communicate references to the instances used in the derivation, but would not be needed for reconstructing volume geometry).
Extreme flexibility: with the ability to specify per-frame attributes one can have changing orientation, spacing, sparse frame sampling. Those features are no available in NRRD-likes, and time showed that there are many many applications that just do not need them.

I would think that by revisiting those various unorthodox choices made in SEG, implementations would not need to change drastically, but would be able to greatly reduce complexity and improve performance by eliminating a lot of special cases.

CPBridge commented 1 year ago

Thanks @fedorov I think that summarises things nicely!

I have largely avoided the metadata issue until now but I agree that there are problems worthy of addressing there too (mostly regarding per-frame functional groups). In my opinion they are a bit less important since they are mostly concerns for developers rather than users. I also worry that many of the issues are inherent to the entire "multiframe" formulation, which spans many IODs within DICOM, not just SEG. This means that changes we make propose could have knock-on effects all over the standard and many IODs already in clinical use.

Nevertheless, I think it would be worthwhile to at least go through the exercise of thinking through what changes to the simplify the metadata might look like. I will try and summarise my thoughts on the topic at some point in the next few days.

JamesAPetts commented 1 year ago

Interesting discussion! +1 to the primary target being DICOMization of the common nrrd/nifti style labelmaps. I think WSI is a distraction in this pursuit personally, if the goal is to fix the common difficulties with transfer of volumetric voxel-labelled segmentation.

I'm all for expanding BINARY also for single segmentations as compared to LABELMAP which is more essential for things like neuro segmentations with 50-400+ labels.

JamesAPetts commented 1 year ago

Empty frames ambiguity:

@fedorov, I'm not sure I agree with removing the practice of only encoding non-blank frames, I personally think this still makes sense in a LABELMAP representation. Its true that an empty frame would collapse to a very small RLE'd frame, but I still wonder if they should be included at all? In the case of e.g. a two segment liver-with-tumor labelmap, maybe only 50 frames of a 500 frame CT would be labelled?

EDIT: Oh I get that this is to remove the notion of per frame functional groups so people don't put random frames out of plane? If the goal is to remove this ambiguity than maybe I agree.

Or perhaps the orientation should be in a shared functional group, and be banned from per frame functional groups, but I worry we are going to affect backwards compatibility if we take that route.

wayfarer3130 commented 1 year ago

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint.

This sounds like potentially a good idea, but not one that I am best placed to execute on. Are there particular considerations for the design of the actual IOD that will make this easier, that we should bear in mind? It seems to me that this could be a separate proposal without interdepencies with the other things that we are discussing here, but maybe I am wrong.

Generally speaking I am of the opinion that the Segmentation object should simply encode the segmentations and their semantics, and viewers are free to choose how to render them, perhaps with reference to a presentation state if desired. But then again, I don't write any viewers :)

I definitely want to make sure we don't do anything that makes viewers' lives harder

The reason I think this should be done here is that an appropriate labelmap definition of the segmentation objects makes it easy to define what is meant by a rendered view of that object, and allows creating very simple viewers which just use the already rendered version - eg for things like thumbnails. Such a definition also tends to make views of segmentations more consistent between viewers because it forces inclusion of at least a minimal set of colormap and transparency definitions, and specifies things like overlap colors. What I'm thinking about labelmaps is something like:

3 -> { color: #ff000030, labels: 'Left Ventricle', ... } 4 -> { color: #ffff0030, labels: ['Left Ventricle', 'Heart'], ... } so that there is at least some presentation information in addition to the labelling information.

That is then fairly obvious how to render it, as semi-transparent colours overlayed on top of grayscale images is fairly well defined. It also works as shown for overlapping segmentations (even if the particular sample data doesn't look realistic).

I will try to be available to comment/help on the DICOMweb section of it - which is really mostly about ensuring that how to render the segmentations is at least partly defined.

fedorov commented 1 year ago

I'm not sure I agree with removing the practice of only encoding non-blank frames

@JamesAPetts The purpose of doing that would be to allow a concise definition of the overall volume geometry of the segmentation. With NRRD-like, you read the tiny header, and you know exactly how to lay out the volume in memory and how to fill it up from the pixel data. With DICOM, you currently need to iterate over all per-frame FGs before you know that.

We cannot fix it in the current SEG, but if the LABELMAP mandates regular sampling of the volume that encloses all of the segments it contains, we might be able to achieve the above. Shared FGs can then contain orientation/spacing, and per-frame image position can be defined in terms of inter-slice spacing defined in the shared FGs. I did not mean that the empty slices on top/bottom of the segmentation should be encoded. I.e., if you have a whole body CT, and segmentation of the heart and liver, empty slices in between would be included, but not above the heart or below the liver.

But Chris made a point, which may be the killer of many of those suggestions - since SEG is (and, most likely, LABELMAP will) share components of the broader enhanced multiframe family of objects, and backward compatibility will need to be maintained, there are hard limits on what can be revisited. Fortunately, @dclunie will be at the PW in person to guide this development appropriately. Let's keep the fingers crossed it is actually feasible to improve within the standard boundaries!

What I'm thinking about labelmaps is something like:

3 -> { color: #ff000030, labels: 'Left Ventricle', ... } 4 -> { color: #ffff0030, labels: ['Left Ventricle', 'Heart'], ... } so that there is at least some presentation information in addition to the labelling information.

@wayfarer3130 I am confused why you list 2 labels accompanying "4", but other than that, there is already a mechanism to allow encoding color alongside the semantics of the segment.

https://dicom.innolitics.com/ciods/segmentation/segmentation-image/00620002

This looks like the following when instantiated: https://viewer.imaging.datacommons.cancer.gov/viewer/1.3.6.1.4.1.14519.5.2.1.7311.5101.170561193612723093192571245493?seriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.7311.5101.206828891270520544417996275680,1.2.276.0.7230010.3.1.3.1070885483.15960.1599120307.701

What is missing in the current definition?

fedorov commented 1 year ago

@wayfarer3130 I missed your point re color for overlapping segmentations, I get it now I think. But I think something like this may belong to Presentation States, not Segmentation - would you agree?

lassoan commented 1 year ago

What is missing in the current definition?

There could be many visualization options that could be useful to have in the segmentation file (if a segment is displayed or hidden by default; 3D opacity - so that if the skin surface is segmented you could still see the other segments, etc.), but these all could be defined at a later point and/or separately in presentation states (so that you can change appearance of the segmentation without recreating the segmentation). Storing color information in the segmentation object is already conceptually questionable, but it is just very convenient in practice that you don't need add a separate file for this.

broader enhanced multiframe family of objects, and backward compatibility will need to be maintained

Most "multiframe" images and segmentations are actually 3D volumes (parallel slices, orthogonal axes, uniform slice spacing along each axis). It could make sense to express this explicitly in DICOM so that applications do not need to deduce this from costly and complex inspection of per-frame metadata. Maybe it is already available in the standard we just need to start using this? This information could be stored in extra fields, so that applications that recognize these new fields could be more efficient, without impacting existing applications. For segmentations, we would not even need to store per-frame metadata for 3D volumes, as there would be no legacy applications to worry about and per-frame metadata can be computed very easily for 3D volumes.

I'm not sure I agree with removing the practice of only encoding non-blank frames

@JamesAPetts The purpose of doing that would be to allow a concise definition of the overall volume geometry of the segmentation. With NRRD-like, you read the tiny header, and you know exactly how to lay out the volume in memory and how to fill it up from the pixel data.

In Slicer, we initially chose to crop the segmentation to the minimum necessary bounding box. However, users struggled with this a lot, so after a few years we switched to export the entire volume (without cropping) by default. Including empty slices did not lead to perceivable difference in compression time and storage size when we used zlib compression. In other use cases (WSI, etc.) empty slices may make a significant difference, so keeping an option for sparse volumes could be useful.

wayfarer3130 commented 1 year ago

@wayfarer3130 I missed your point re color for overlapping segmentations, I get it now I think. But I think something like this may belong to Presentation States, not Segmentation - would you agree?

No, I think the base segmentation should define the colors and transparency/opacity levels as a base part of the standard so that the segmentation is reasonably well defined across viewers as to the basic representation. There might be lots of other representations stored to presentation states, but it shouldn't require a presentation state to create a well defined rendered view of the segmentation.

fedorov commented 1 year ago

Most "multiframe" images and segmentations are actually 3D volumes (parallel slices, orthogonal axes, uniform slice spacing along each axis). It could make sense to express this explicitly in DICOM so that applications do not need to deduce this from costly and complex inspection of per-frame metadata. Maybe it is already available in the standard we just need to start using this?

I think DimensionOrganizationType = 3D might be it actually. Maybe this one can be used in the new LABELMAP object?

https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.17.html

wayfarer3130 commented 1 year ago

In terms of the WSI display of segmentations, all I'm asking is to make it not incompatible with whole slide imaging as well as not incompatible with other types of non-volumetric imaging such as simple DX or CR scans. I would vote against any proposal that excluded imaging modalities because that probably means the proposal hasn't been well enough thought out yet.