Closed hackermd closed 2 years ago
@dclunie could you kindly help us in making the correct assumptions about the relationship of series and imaging target and the grouping of VL Whole Slide Microscopy Image instances that have been acquired for a given physical glass slide.
The standard is not very explicit in this respect, but Part 3 Section A.32.8.1 states the following, which suggests that all SOP instances shall be part of the same series:
An entire set of tiles for an acquisition may be encoded in the frames of a single SOP Instance, in multiple SOP Instances of a single concatenation, or in multiple SOP Instances in a Series (with or without concatenations). E.g., a single SOP Instance may contain an entire low resolution image as a single tile (single frame), or a single SOP Instance may contain an entire high resolution, multi-focal depth, multi-spectral acquisition (multiple frames).
I foresee a couple of challenges if VL Whole Slide Microscopy Image instances for the same digital slide are not part of the same series. For example, there will be no guarantee that they have the same Frame of Reference UID. Therefore, we may not be able to assume that images can be aligned based on image orientation, offsets, and pixel spacing, but would have to register them.
If we allow instances to be distributed across multiple series, we will need to perform many more checks and may have to refuse to blend certain instances (for example, if they have a different Frame of Reference UID) even though they could in principle be aligned just fine. I can see how acquisition (or conversion) software may automatically assign a new Frame of Reference UID to each series.
How is this handled in the case of MR, where multiple sequences may exist for the same acquisition and imaging target?
What set of attributes (keywords) would we need to group instances into slides?
@hackermd please have a look at PR at https://github.com/MGHComputationalPathology/slim/pull/25.
at the moment I still use the following assumptions: 1) RGB images have 1:1 correspondence with series 2) monochorme images (multi channels): there is only one multi channel acquisition per study (which can be over multiple series).
To differenziate between colored and monochorme images, I check the SamplesPerPixel dicom attribute of the instances.
This is ok to have a first working UI. We can always then improve the function fromSeriesListToSlideList in https://github.com/MGHComputationalPathology/slim/pull/25/files#diff-25b188602cbdb7a70c7b8d7736bdbd2bab7e94a4e591353ca29b05b4529a67b4 for grouping the instances in differerent manner.
It would be ideal to have some attributes in dicom which would tell us how to group istances (FrameOfReferenceUID can't be used with the multichannels, see comments in https://github.com/MGHComputationalPathology/dicom-microscopy-viewer/pull/54).
I tried to download ftp://medical.nema.org/medical/dicom/cp/cp2135_01_addAcquisitionEntityAndUID.pdf , but I got an empty file. Markus could you please double check the link? maybe I don't have the right authorizations for downloading?
Thanks!
To differenziate between colored and monochorme images, I check the SamplesPerPixel dicom attribute of the instances
Mmh.. Using the Samples Per Pixel attribute kinda works, but it is not ideal in my opinion. For example, there may be monochrome images that we don't want to blend. Also, the LABEL image (and potentially also the OVERVIEW image) will probably still be a color image, even if the VOLUME images are monochrome images.
We will need a way to define an acquisition and group images based on acquisition-related attributes (see below).
It would be ideal to have some attributes in dicom which would tell us how to group istances (FrameOfReferenceUID can't be used with the multichannels, see comments in MGHComputationalPathology/dicom-microscopy-viewer#54).
I would argue that images that are part of the same acquisition shall have the same frame of reference. If images don't have the same Frame of Reference UID, we cannot assume that the images can be overlaid/blended without performing an image registration. If the images in the data collection used for testing don't have the same Frame of Reference UID, then this is a problem in my opinion and we should patch the image data sets. @dclunie what do you think?
I would recommend grouping instances by Frame of Reference UID and potentially considering elements of the Optical Path Sequence (e.g., Illumination Type Code Sequence) to determine the number of "channels" (i.e., optical paths). Of note, in case of iterative immunofluorscence imaging, we have may multiple images that were acquired in the same "channel", i.e., using the same optical path (illumination type, wave length, etc.), as part of the same acquisition. Therefore, the term "channel" is confusing and misleading and I am wondering whether we should avoid it altogether. What about using the term "Samples" (as in Samples per Pixel)? Would anyone understand that?
I tried to download ftp://medical.nema.org/medical/dicom/cp/cp2135_01_addAcquisitionEntityAndUID.pdf , but I got an empty file. Markus could you please double check the link? maybe I don't have the right authorizations for downloading?
The link works for me when connecting to the FTP server using the Mac OSX Finder application. Thanks @dclunie for pointing out the CP.
I looked into the acquisition-related attributes. Acquisition Number is a type 3 attribute and may thus be missing. The only type 1 attribute that would be guaranteed to be present in a VL Whole Slide Microscopy Image is Acquisition DateTime. However, I am not sure whether any image acquisition equipment (or converter) would use it as expected, i.e., whether all instances of a multiplexed IF experiment would indeed get the same value assigned for this attribute. @dclunie we may want to clarify that in the standard as part of CP 2135 and explicitly state that images acquired as part of a multiplexed IF experiment shall have the same value for Acquisition DateTime and Acquisition Number. We may want to further specify the criteria for grouping instances into an acquisition (see criteria a-d of Series IE). Specifically, we should clarify that all composite instances within an acquisition shall be associated with exactly one Frame of Reference IE and all have the same Acquisition information (Acquisition Number, Acquisition DateTime, etc.).
P.S.: We should probably also ensure that all images of an acquisition have the same specimen information (at a minimum the same Container Identifier).
Ok for me in using Samples instead of Channels.
So I think we have two issues: 1) how to group series into unique acquisitions 2) how to distinguish between RGB images and monochorme images (and understand if the monochorme images is part of a Multiplexed Immunofluorescence datasets).
For (1) I agree with you, the grouping checks that we do in https://github.com/MGHComputationalPathology/slim/blob/c0bea14abbaf22b4276ef4ff8e3b6484ab76877e/src/utils/fromSeriesListToSlideList.tsx should just be based on the Frame of Reference UID. That would clean the code from any assumptions and make life easier in general.
NOTE at the moment also the in the dicom-microscopy-vivewer the Frame of reference UID checks when doing Samples compositions are disabled, see https://github.com/MGHComputationalPathology/dicom-microscopy-viewer/pull/54/files#diff-5f25318877b7159fd7a85f6f4cf599455d9eb72e22f9cbd2b6069a75ff8e5a9aR116 https://github.com/MGHComputationalPathology/dicom-microscopy-viewer/pull/54/files#diff-5f25318877b7159fd7a85f6f4cf599455d9eb72e22f9cbd2b6069a75ff8e5a9aR453
If we could get a multi-Samples test data set with all the Frame of Reference UID equal values, we can update the code and test. But the point is: at this moment, in a real-world situation can we assume that all the Samples will have the same Frame of Reference UID? I mean if we could push it to the standard and assume that we can use the standard, it would be great.
For (2) yes this is tricky and I understand that just checking SamplesPerPixel could not be ideal. Probably we could undestrand if it is a Sample of Multiplexed Immunofluorescence datasets if :
A) if N series (and the images instances) have the same Frame of Reference UID B) the grouped images have multiple OpticalPaths C) the volume images instances have SamplesPerPixel === 1 ? (not sure if (A) and (B) is already enough) D) should we check also Acquisition DateTime ? (not sure if (A) and (B) is already enough)
If these assumptions are not satisfied then it is a "normal" case (either if it is a RGB or a monochorme images).
Regarding the test dataset (server http://34.68.90.36/ studyInstanceUID = '1.3.6.1.4.1.5962.99.1.2103930081.1286074986.1595536829665.3.0'), Markus would be possible for you to set all the Frame of reference UID of the 44 series to the same value? should we set also the Frame of reference UID value of the volume images instances to the same value (does this make sense?)? or is something for which we should wait David's feedback?
P.S.: We should probably also ensure that all images of an acquisition have the same specimen information (at a minimum the same Container Identifier).
OK
how to group series into unique acquisitions
We want to group image instances into unique acquisitions, not series. Following the terminology of CP 2135, Series and Acquisition would be considered two different Information Entities. The instances belonging to an individual Acquisition may all be contained within one Series or distributed across multiple Series.
I would argue that all instances of an Acquisition shall share the same Frame of Reference. However, I am curious what @dclunie has to say about that.
how to distinguish between RGB images and monochorme images (and understand if the monochorme images is part of a Multiplexed Immunofluorescence datasets)
Using Samples Per Pixel is perfectly fine for distinguishing between RGB and monochrome images in my opinion. However, I would additionally use Photometric Interpretation. Determining whether a monochrome image is part of a multiplexed IF image acquisition is potentially more challenging. I was thinking that we can compare the values of attributes of the Optical Path Sequence, but this may not be feasible either because they could in principle all have the same optical path in case of iterative/cyclic immunofluorescence. I think we'll need to stick to the concept of an Acquisition and define a multiplexed IF image acquisition as an acquisition (see definition below) with more than one monochrome VOLUME image.
In summary, I think we can group image instances into acquisitions as follows:
Each image instances shall have
how to group series into unique acquisitions
We want to group image instances into unique acquisitions, not series. Following the terminology of CP 2135, Series and Acquisition would be considered two different Information Entities. The instances belonging to an individual Acquisition may all be contained within one Series or distributed across multiple Series.
I would argue that all instances of an Acquisition shall share the same Frame of Reference. However, I am curious what @dclunie has to say about that.
how to distinguish between RGB images and monochorme images (and understand if the monochorme images is part of a Multiplexed Immunofluorescence datasets)
Using Samples Per Pixel is perfectly fine for distinguishing between RGB and monochrome images in my opinion. However, I would additionally use Photometric Interpretation. Determining whether a monochrome image is part of a multiplexed IF image acquisition is potentially more challenging. I was thinking that we can compare the values of attributes of the Optical Path Sequence, but this may not be feasible either because they could in principle all have the same optical path in case of iterative/cyclic immunofluorescence. I think we'll need to stick to the concept of an Acquisition and define a multiplexed IF image acquisition as an acquisition (see definition below) with more than one monochrome VOLUME image.
In summary, I think we can group image instances into acquisitions as follows:
- same Frame of Reference UID
- same Acquisition DateTime
Each image instances shall have
- Samples per Pixel == 1
- Photometric Interpretation == MONOCHROME2
ok perfect I agree
@dclunie what are the expectations regarding the uniqueness of an Optical Path Identifier? I realized that they are only guaranteed to be unique within a single image instance (see Optical Path Sequence and Optical Path Identifier) and different instances may use the same identifier for different optical paths (i.e., different microscope settings such as Illumination Wave Length). The standard states that LOCALIZER images (which have been retired) shall determine the unique number of optical paths and assign each each optical path a unique identifier. However, it's left unspecified how the uniqueness should be determined.
Further, in case of iterative immunofluorescence microscopy imaging, the same microscope settings may be re-used multiple times as part of the same acquisition. That opens the question of whether each iteration would be considered a different or the same optical path. What are your thoughts on that?
We may want to introduce a Optical Path UID, which is guaranteed to be unique across image instances
Sorry for the delayed reply, but I wanted to produce some new samples for you first.
The bottom line is that a common Frame of Reference (FoR) UID is key.
My previous samples that you are using are wrong in this respect (have different FoR UIDs for each channel). I have made some new samples (they are taking forever to upload, so I am posting this in advance of their availability).
Wrt. "Acquisition": in the normal clinical scenario, whether for bright field images or IHC, one scan of one slide will indeed be one DICOM Acquisition, theoretically, although this is not well defined in the standard and Acquisition Number in particular may be absent, or perhaps always "1" regardless. I am working on a CP to improve the documentation of this, but that will not immediately change vendor practice nor necessarily be helpful to you.
Further, in the samples I sent before and the same material updated in the new samples (LUNG-1-LN), the process is CyCIF, which involves multiple acquisitions in multiple "cycles" with clearing and re-staining; so these really are separate DICOM Acquisitions, even though the images have been registered. To stress this point, in the updated samples I have populated Acquisition Number with the Cycle number, so they will be different (even though the Acquisition Date and Time might happen to be the same; do not rely on this).
I have also populated Accession Number and put a more meaningful value in Container (Slide) Identifier, which happens to be the same as Specimen Identifier for this case, but don't depend on that either.
In short, these images (set of spatially co-registered multichannel fluorescence images of one section of a specimen) will share a common:
and have different
If there were more than one multichannel set of images per Study, which there is not in my samples, they would have different Frame of Reference UIDs, and probably different Container (Slide) Identifiers.
Also, like the previous samples, each (set of images in a) pyramid (within one Series), will have a specified Optical Path Identifier as well as descriptive attributes that describe wavelengths, etc. These will correspond to the information in the accompanying Specimen Description sequence, though there is no direct reference between them. For the LUNG-1-LN sample, I have gone out of my way to number the Optical Paths distinctly - even if in repeated cycles the same wavelengths are used, or even the same antibody (e.g., the repeated DAP acquisitions), I have followed the original sequential numbering for the "Channel Number" described in their "Table2AntibodyStainingPlanforDATASET-1.txt" file.
So, since the Optical Path is specific to a pyramid of images for a particular channel, and is not repeated in other images (other pyramids), or referenced externally, the identifier alone, if you need it, should be sufficiently unique for your purposes, even though there is no such thing as an "Optical Path UID"; what purpose would that serve?
That is not to say that all providers of multichannel fluorescence images will follow the same plan, and not repeat Optical Path Identifiers, but it was certainly our (DICOM WG 26's) intent in defining the Optical Path as the descriptor of separate channels that they be used the way I have described them (and that is reiterated in the use of Optical Path as a potential default Dimension in the TILED_FULL discussion).
I will let you know when the updated samples have finished transmitting.
David
PS. Wrt. the earlier comment that "there will be no guarantee that they have the same Frame of Reference UID", there should be - if they have different FoR UIDs then they are NOT purported to be spatially co-registered. It was just a mistake for me to not arrange that in the original sample converted images. That said, you may want to allow the user to state that some sets of images are spatially co-registered, e.g., ask them if they want all images in a Study (or whatever subset) to be grouped for display together, but you may not want to assume that, since very likely there may be multiple different sets of images that are not registered.
That is what is done in radiology (e.g., MR volumes) - only sync 3D scrolling or superimposition if either the same FoR UID, or the user manually overrides different FoR UIDs.
PPS. Both Samples per Pixel and Photometric Interpretation should distinguish bright field from fluorescence types of images.
Thanks David for the exhaustive explanation and notes. I will wait the new data and then I will update the code for grouping the instances with the keys that you suggested. Thanks again!
Thanks @dclunie for the clarification and your thoughtful feedback.
Further, in the samples I sent before and the same material updated in the new samples (LUNG-1-LN), the process is CyCIF, which involves multiple acquisitions in multiple "cycles" with clearing and re-staining; so these really are separate DICOM Acquisitions, even though the images have been registered.
I think I then misunderstood the concept of an Acquisition in the context of CyCIF or other iterative immunofluoresence image acquisition techniques. It makes sense to consider each "cycle" a separate Acquisition.
However, it opens the questions how we should refer to the entire collection of images that are acquired as part of such a multiplexed IF experiment? The collection is neither a Series nor an Acquisition. Should we simply call it "VolumeImages"? I can't come up with a better term and there doesn't seem to be a corresponding information entity in the DICOM data model.
For the LUNG-1-LN sample, I have gone out of my way to number the Optical Paths distinctly - even if in repeated cycles the same wavelengths are used, or even the same antibody (e.g., the repeated DAPI acquisitions)
That is reasonable. This would imply that an Optical Path is specific to a given Acquisition. If the same set of microscope settings would be (re)used in another Acquisition, they would be considered a different Optical Path. It could be worth describing that expectation in the standard.
So, since the Optical Path is specific to a pyramid of images for a particular channel, and is not repeated in other images (other pyramids), or referenced externally, the identifier alone, if you need it, should be sufficiently unique for your purposes, even though there is no such thing as an "Optical Path UID"; what purpose would that serve?
We need an identifier for each "channel" (i.e., Optical Path) that is unique within the scope of the entire experiment (i.e., across image instances and acquisitions), for example to select the "image of the DAPI signal at cycle 5". Currently, such a Optical Path UID is missing in DICOM. We could introduce a opticalPathUID
in the application to track and select channels, but I think it would be beneficial to include the value in the image data sets to be able to store and communicate this information.
@Punzo we will need to update the methods of the viewer to use opticalPathUID
instead of opticalPathIdentifier
.
@Punzo we will need to update the methods of the viewer to use
opticalPathUID
instead ofopticalPathIdentifier
.
maybe I missed something, from David's comment https://github.com/MGHComputationalPathology/slim/issues/22#issuecomment-853935252, I understood that this method would be sufficient:
1) first grouping of observations
(what we currently call acquisitions
in the code, before they were called slides
) by FrameofReferenceUID
: i.e. putting togheter images (overview, label, volume
) instances from N series.
2) at this point we could simply check the opticalPathIdentifier
and the SamplesPerPixel
and PhotometricInterpretation
.
A) If the number of **opticalPathIdentifier** > 1
and **SamplesPerPixel** === 1
and **PhotometricInterpretation** === MONOCHROME2
, then the observation is a multiplexed samples with N "channels";
B) If the number of **opticalPathIdentifier** === 1
and **SamplesPerPixel** === 1
and **PhotometricInterpretation** === MONOCHROME2
, then the observation is a simple single monochorme image sample;
C) If the number of **opticalPathIdentifier** === 1
and **SamplesPerPixel** !== 1
and **PhotometricInterpretation** === RGB or YBR_*
, then the observation is a RGB single image sample.
Since in (2) we will look into at opticalPathIdentifier
values only in already grouped observations
by FrameofReferenceUID
, I understood from @dclunie 's messagge that the opticalPathIdentifier
values will be unique, or am I wrong?
Anywway, I would use for the moment (as soon as new David's data will arrive) the method described in this message, since for the opticalPathUID
we would need to submit a correction and wait for the adaption in the DICOM standard. Once it is in the standard, we can update the code in future to use the new attribute opticalPathUID
. Is this a good plan for you @hackermd ?
P.S.: should I rename in slim Acquisition
into Observation
or something else? i.e. AcquisitionItem, AcquisitionList, AcquisitionViewer (previously SlideItem, SlideList, SlideViewer).
Since in (2) we will look into at opticalPathIdentifier values only in already grouped observations by FrameofReferenceUID, I understood from @dclunie 's messagge that the opticalPathIdentifier values will be unique, or am I wrong?
Based on my interpretation of the standard, the values of the Optical Path Identifier attributes may be the same across the different image instances (e.g., all "1"
).
See Optical Path Sequence and Optical Path Identifier:
For example, each of four referenced images may use a different optical path (color), and within each of those image SOP Instances the single Optical Path Sequence (0048,0105) Item is identified as "1", although the meaning of optical path "1" is different for each image.
Anywway, I would use for the moment (as soon as new David's data will arrive) the method described in this message, since for the opticalPathUID we would need to submit a correction and wait for the adaption in the DICOM standard. Once it is in the standard, we can update the code in future to use the new attribute opticalPathUID. Is this a good plan for you @hackermd ?
I would argue that we should assign (or require the application to assign) a opticalPathUID
for each "channel", since the opticalPathIdentifier
does not provide sufficient uniqueness guarantees.
should I rename in slim Acquisition into Observation or something else? i.e. AcquisitionItem, AcquisitionList, AcquisitionViewer (previously SlideItem, SlideList, SlideViewer).
I would suggest renaming it back to Slide
(sorry!). Within the scope of slim, a digital slide will then be defined as a collection of VL Whole Slide Microscopy Images that share the same Container Identifier and Frame of Reference UID (which specifies the slide coordinate system). Makes sense?
Since in (2) we will look into at opticalPathIdentifier values only in already grouped observations by FrameofReferenceUID, I understood from @dclunie 's messagge that the opticalPathIdentifier values will be unique, or am I wrong?
Based on my interpretation of the standard, the values of the Optical Path Identifier attributes may be the same across the different image instances (e.g., all
"1"
).See Optical Path Sequence and Optical Path Identifier:
For example, each of four referenced images may use a different optical path (color), and within each of those image SOP Instances the single Optical Path Sequence (0048,0105) Item is identified as "1", although the meaning of optical path "1" is different for each image.
Anywway, I would use for the moment (as soon as new David's data will arrive) the method described in this message, since for the opticalPathUID we would need to submit a correction and wait for the adaption in the DICOM standard. Once it is in the standard, we can update the code in future to use the new attribute opticalPathUID. Is this a good plan for you @hackermd ?
I would argue that we should assign (or require the application to assign) a
opticalPathUID
for each "channel", since theopticalPathIdentifier
does not provide sufficient uniqueness guarantees.
1) ok, but what I mean is that we can do a first implementation (as soon as I have the new data from David) just using the opticalPathIdentifier. Then we can update the code to use opticalPathUID (when it will be accepted in the standard, which can be done in parallel to speed up). So from my point of view everything will be done and then we have just to update few lines to have the final ideal solution.
I would suggest renaming it back to
Slide
(sorry!). Within the scope of slim, a digital slide will then be defined as a collection of VL Whole Slide Microscopy Images that share the same Container Identifier and Frame of Reference UID (which specifies the slide coordinate system). Makes sense?
2) ok, now problem, I will update https://github.com/MGHComputationalPathology/slim/pull/25 by renaming everything to Slide. I will do it togheter with (1) when I get the new data.
ok, but what I mean is that we can do a first implementation (as soon as I have the new data from David) just using the opticalPathIdentifier. Then we can update the code to use opticalPathUID (when it will be accepted in the standard, which can be done in parallel to speed up). So from my point of view everything will be done and then we have just to update few lines to have the final ideal solution.
Fine with me. We should probably consider using opticalPathIdentifier
a bug, but we can address this later.
ok, now problem, I will update #25 by renaming everything to Slide. I will do it togheter with (1) when I get the new data.
Excellent! Please make sure you also group images by Container Identifier and not just by Frame of Reference UID. In theory, there should not be any images with the same Frame of Reference UID and different Container Identifier, but safety first :)
ok, but what I mean is that we can do a first implementation (as soon as I have the new data from David) just using the opticalPathIdentifier. Then we can update the code to use opticalPathUID (when it will be accepted in the standard, which can be done in parallel to speed up). So from my point of view everything will be done and then we have just to update few lines to have the final ideal solution.
Fine with me. We should probably consider using
opticalPathIdentifier
a bug, but we can address this later.ok, now problem, I will update #25 by renaming everything to Slide. I will do it togheter with (1) when I get the new data.
Excellent! Please make sure you also group images by Container Identifier and not just by Frame of Reference UID. In theory, there should not be any images with the same Frame of Reference UID and different Container Identifier, but safety first :)
yes, perfect!
just a note: not sure where David is uploading the data. Markus, do you have right to copy them on Steve' server (http://34.68.90.36/) for fast dev/testing (last time I tried, I could not)? or do you have another one to use?
Updated sample at:
@pieper, can you replace the version on your server with this one please?
@dclunie thank you for sharing the updated files!
DICOM store with those is here: projects/idc-sandbox-000/locations/europe-west6/datasets/htan-dev/dicomStores/LUNG-1-LN-20210604
For the next time, it would probably be best to do conversion on a VM and copy uncompressed DICOM files to a bucket under the idc-htan-000
project, to keep HTAN-related development compartmentalized. Google Healthcare DICOM import works only with DICOM files. Since you uploaded .tar.bz
, I had to copy it to a VM, uncompress, untar, and then re-upload into a bucket before importing into the DICOM store above.
This has been addressed in https://github.com/MGHComputationalPathology/slim/pull/25 and https://github.com/MGHComputationalPathology/dicom-microscopy-viewer/pull/54, Please @hackermd review
Currently, the
CaseViewer
component assumes that each digital slide corresponds to a series (i.e., that there is a one-to-one mapping between series and slides) and consequently lists each series as a separateSlideItem
in theSlideList
(note that I already renamed the components with 0a9e43aa4c5f1d6313a9b9336250f29c40a60d9b).In general, this assumption may not hold, since VL Whole Slide Microscopy Image instances corresponding to a digital slide may be split across multiple series. For example, each "channel" (optical path) of an iterative immunofluorescence image acquisition may be placed in a separate series.
Instead of assuming that all VL Whole Slide Microscopy Image instances of a digital slide are contained within a single series, we should group instances per slide (potentially across multiple series).
@Punzo as discussed, an elegant approach could be to add a
groupInstances()
function to the dicom-microscopy-viewer library (preferably in metadata.js), which we could use in SliM and in the constructor of theVolumeImageViewer
.For example: