Allow segmentations to reference multiple images

Applying the logic of #193 to Segmentation instances.

Thanks for your feedback @CPBridge.

The use case I am trying to support is the following: We perform segmentation using more than one input image channels (grayscale images). These may be different optical paths in case of VL Whole Slide Microscopy Image instances or different sequences in case of an Enhanced MR Image instance.

The intended semantics of this are unclear to me and are very different from those of the case of single frame source images, which creates further confusion. Unless there are user-provided plane positions, in the single frame case each SOP Instance in source_images corresponds to one 2D slice of the pixel_array, and in the multiframe case, each frame of the single input SOP instance corresponds to one 2D slice of the pixel_array. Is any framewise correspondence implied by the passing of further multiframe instances?

If so i.e. framewise correspondence is implied between the pixel array and all source images, it is also implied between all source images also. In this cases there need to be significantly more checks to ensure that this is valid (all source images have the same frame of reference UID, number of frames, corresponding spatial locations etc).

Additional multi-frame instances would currently be implied to have the same organization (pixel spacing, image orientation, and plane positions) as the first multi-frame image in source_images. I agree that we should check that assumption.

Also you would probably want to make sure that the this correspondence is recorded for all the subsequent source images, beyond just the first, in the DerivationImageSequence/SourceImageSequence as part of the per-frame functional groups. Currently the only effect of your change (as far as I can tell) is to add the second and subsequent source images to the SourceImageSequence attribute at the root of the dataset.

Good point! Yes, we will need to include references at the frame level, too.

Note that there is an important difference here because there is no way to record that a single slice of the input pixel array is derived from multiple single frame inputs, even though this is in principle allowed within the standard. So we have a major difference between how single frame input and multiframe inputs are treated.

I defer to your judgement whether it would be useful to extend this logic to single-frame inputs. I can imagine use cases where one performs segmentation using slices from two series of single-frame image instances (e.g., T1 and a T2 weighted slices from two series of MR Image instances).

If framewise correspondences are not implied, then this the first image passed to source_images is significantly more important than the rest, as it defines the frame of reference and the locations of individual slices of the input segmentation. This feels weird to me (one would expect all items in a list parameter to be equally important) and again this is very different from single frame case.

I agree, we should make that assumption explicit, perform the necessary checks, and document the behavior.

@CPBridge we already perform several checks (see here) on source_images. We probably should add more attributes (Image Orientation, Pixel Spacing, etc.). We could also perform a frame-level comparison to ensure that the first source image is representative of all source images, but that may get computationally expensive and we may just want to document that. From a performance perspective, we should probably make that check here, but it would reduce cohesion.

@CPBridge I added checks on source_images (15a71691a1622fc93814a88d998662d14093b2c0) and included references for all source image frames (ead8a30afd80f36bd41b44fdeaff23e50a7d509b). I also added a unit test to ensure that works properly (fc5b2d16785836e44d21338045dd3c72f9e60fb6).

As you pointed our earlier, the behavior is now different for single-frame and multi-frame images. I suggest we also support multiple series of single-frame image instances. To this end, I have updated the logic on how frames of single-frame image instances are referenced (194b1d37decf1de91a0929300325d7ea3bfede78). Trying to provide single-frame image instances from multiple series will currently still fail, because we assert that Series Instance UID is unique across images. However, I think we could now relax that. What do you think?

Hi @hackermd, sorry: I have only just noticed that this PR was waiting on my input...

This is looking better and I think the behaviour for multiframe images makes sense, although it still needs to be documented in the docstring. Furthermore, do you expect multiple multi-frame source instances to come from the same series or different series? I would expect them to come from different series (the whole point of multiframe images being to get rid of series of multiple objects). Currently the constructor will fail if source images are not from the same series, making this change applicable only in quite unusual situations of series of multiframe intances.

As you pointed our earlier, the behavior is now different for single-frame and multi-frame images. I suggest we also support multiple series of single-frame image instances. To this end, I have updated the logic on how frames of single-frame image instances are referenced (https://github.com/herrmannlab/highdicom/commit/194b1d37decf1de91a0929300325d7ea3bfede78). Trying to provide single-frame image instances from multiple series will currently still fail, because we assert that Series Instance UID is unique across images. However, I think we could now relax that. What do you think?

I support this direction (I have worked on several situations in radiology were a segmentation was derived from multiple input series) but unfortunately what you have done so far is too simple. I imagine that the "multiple series of single frame instances" would work as follows. There would be a segmentation pixel array of n frames (i.e. shape (n, h, w)), which has been derived from m different input series, each of which must consist of of n frames. This gives rise to a set of constraints on the input images that we are not currently checking for, i.e. that the number of source images is a positive integer multiple of the number of slices in the segmentation pixel array (m x n), and that these are grouped correctly such that there are m unique series instance UIDs, each with exactly n instances. If we just used what is currently implemented and removed the series UID check, then we have a free-for-all where the above may not apply at all, and the derivation will only be recorded for the first n items of the source_images, and the rest will be ignored.

Then we come onto the related problem (again for the "multiple series single frame instances") case of how to group the source instances in order to figure out the correspondence between the source images and the slices of the segmentation pixel array. Currently, it is assumed that source_images[i], is the (single) source image for slice pixel_array[i], but this no longer works when there can be multiple input series. I see two options here:

The user continues to pass a list of instances, as currently, which now can have length of any positive integer multiple of n. Items 0 through n-1 are assumed to correspond to pixel_array[0], items n through 2n-1 are assumed to correspond to pixel_array[1], and so on. Alternatively, you could do this the other way around, with items 0, n, 2n corresponding to pixel_array[0], 1, n+1, 2n+1 corresponding to pixel_array[1] etc. There would also need to be some basic checks on patient position (maybe with a tolerance) to enforce this. I think this is probably my preferred option as it is the simplest generalization of the current API, though it is potentially a little confusing...
We allow source_images to be either a Sequence[Dataset] (for backwards compatibility) or a Sequence[Sequence[Dataset]], with each inner sequence containing an entire series. Each inner Sequence would have to have the same length.

Neither option is great, but some compromise will be needed somewhere if this is going to work. What do you think?

Do you expect multiple multi-frame source instances to come from the same series or different series? I would expect them to come from different series (the whole point of multiframe images being to get rid of series of multiple objects). Currently the constructor will fail if source images are not from the same series, making this change applicable only in quite unusual situations of series of multiframe instances.

Good catch! The images may come from different series and we will need to be able to support that use case. Addressed via 88d03c35b9c0777f043337336bfea5a05b7ff1ae

I support this direction (I have worked on several situations in radiology were a segmentation was derived from multiple input series) but unfortunately what you have done so far is too simple. I imagine that the "multiple series of single frame instances" would work as follows. There would be a segmentation pixel array of n frames (i.e. shape (n, h, w)), which has been derived from m different input series, each of which must consist of of n frames. This gives rise to a set of constraints on the input images that we are not currently checking for, i.e. that the number of source images is a positive integer multiple of the number of slices in the segmentation pixel array (m x n), and that these are grouped correctly such that there are m unique series instance UIDs, each with exactly n instances. If we just used what is currently implemented and removed the series UID check, then we have a free-for-all where the above may not apply at all, and the derivation will only be recorded for the first n items of the source_images, and the rest will be ignored.

Agreed, we should handle this use case more explicitly and put checks in place to ensure assumptions are satisfied. For now, I have added checks to ensure all single-frame images are from the same series (88d03c35b9c0777f043337336bfea5a05b7ff1ae).

We allow source_images to be either a Sequence[Dataset] (for backwards compatibility) or a Sequence[Sequence[Dataset]], with each inner sequence containing an entire series. Each inner Sequence would have to have the same length

I find this option more intuitive and would thus prefer this approach. However, I defer to your judgement, since you will be dealing with this use case more frequently. In fact, I would prefer if you'd implement that change, given that you have more experience in handling series of single-frame image instances.

ImagingDataCommons / highdicom

Allow segmentations to reference multiple images #199