Closed melissalinkert closed 1 year ago
@sbesson @melissalinkert Is it possible to move the annotation from a key-value pair to xml annotation like below?: https://v2-demo-dev.glencoesoftware.com/webclient/?show=image-254153 (under "Others"):
https://v2-demo-dev.glencoesoftware.com/webclient/?show=image-156067 (under "Others"):
@muhanadz a priori, I think the writing code can use any OME structured annotation that suports storing a string value - see the full list here. So MapAnnotation
, XmlAnnotation
or even CommentAnnotation
are viable options.
Since this is not a native XML string, from your side, what is the primary benefit of using XmlAnnotation
rather than key/value pairs?
Discussed with @sbesson and @muhanadz earlier today. 9938304 now writes the rendering metadata into several different annotations, so that we can evaluate which is best suited to balancing the three main considerations here:
This PR has been switched to draft status to reflect the fact that it should not be merged as-is.
The only annotation type I added here that we didn't discuss is to have two DoubleAnnotation
s linked to each Channel
- I forgot earlier that Channel
is now annotatable. One DoubleAnnotation
is for the min, and one is for the max, with the Description
indicating which is which.
The next step is to convert a few files to OME-TIFF using bioformats2raw 0.5.0 and this PR, and then import the OME-TIFFs to OMERO. Test cases should include brightfield, fluorescence with 4-6 or so channels, and multiplex with a large number of channels (60-80?).
Once we decide which annotation strategy to use, this PR should also bump to bioformats2raw 0.5.0 and add test cases.
The only annotation type I added here that we didn't discuss is to have two DoubleAnnotations linked to each Channel - I forgot earlier that Channel is now annotatable. One DoubleAnnotation is for the min, and one is for the max, with the Description indicating which is which.
Nice finding. At least from my side, this is definitely the most elegant solution with regards to respecting all the constraints exposed in https://github.com/glencoesoftware/raw2ometiff/pull/82#issuecomment-1273944525.
Assuming channel annotation is the preferred solution, my main caveat is that relying exclusively on Description
to differentiate min/max values is potentially brittle.
DoubleAnnotation
, I could see using different namespaces e.g. glencoesoftware.com/ngff/rendering/{min,max}
which would also facilitate the client-side queryChannel.MapAnnotation
using min/max
as the keys would be a suitable approach94eea04 now gives just two options:
MapAnnotation
linked to each channel, with namespace glencoesoftware.com/ngff/rendering
and keys min
and max
DoubleAnnotation
s linked to each channel, with namespaces glencoesoftware.com/ngff/rendering/min
and glencoesoftware.com/ngff/rendering/max
. The Description
value is as it was before, but could be removed if the namespace is enough to distinguish between min and max.Definitely going to be important to try this out with larger channel counts and an HCS dataset before making a final decision.
Tested using the representative OME-TIFF plate from BBBC017, both in its original form and converted using bioformat2raw
following by raw2ometiff
with this PR included
The data was imported into OMERO under both format both with the default options and skipping min/max calculation + thumbnail generation steps.
The table below summarizes the time spent in the different steps of the import process (as rerported by omero fs importtime
) as well as the number of inserts (as grepped from the log files,
NIRHTa+001.ome.tiff | NIRHTa+001.ome.tiff (no min/max, no thumbnails) | NIRHTa+001_converted.ome.tiff | NIRHTa+001_converted.ome.tiff (no min/max, no thumbnails) | |
---|---|---|---|---|
upload time | 32.01s | 13.19s | 46.46s | 21.87s |
setId time | 11.29s | 9.34s | 18.13s | 17.05s |
metadata time | 673.88s | 682.63s | 2624.29 | 2527.59s |
pixels time | 719.68s | 637.64s | 2037.80 | 1893.49s |
rdefs time | 1065.52s | 1478.75s | ||
thumbnail time | 442.69s | 852.58s | ||
Number of INSERT | 28049 | 14223 | 69521 | 55695 |
A few comments:
MapAnnotation
, 2 DoubleAnnotation
and their associated ChannelAnnotation link for each channel (6 x 2 x 9 x 384 = 41472 extra objects). MapAnnotation
proposal is probably the most efficient one as it will lead to the generation of 14K objectsAll in all, it is very possible there is not an universal workflow that will work with all modalities and we will need to make decisions based on the domain. For instance, it is very likely that preserving the rendering metadata when converting single large floating-point images will be valuable but this might be something that we don't want to do for HCS data simply because some of the costs outweigh the benefits.
I think I'm OK with HCS vs non-HCS being handled separately, since those are already two distinct output formats. We were at one point discussing checking channel counts etc., which is where I'm less comfortable.
Is the idea then to write the channel annotations when any of the following are true:
--no-hcs
option is used--minmax
option is used (maybe needed for plates with stitched floating point wells?)and so omit the channel annotations when any of the following are true:
--no-hcs
not used--no-minmax
option is usedI think I'm OK with HCS vs non-HCS being handled separately, since those are already two distinct output formats. We were at one point discussing checking channel counts etc., which is where I'm less comfortable.
I agree. Anything that would force us to define a hardcoded arbitrary maximum channel limit is worth avoiding.
Is the idea then to write the channel annotations when any of the following are true:
- HCS metadata not present in original data
- existing
--no-hcs
option is used- existing
--minmax
option is usedand so omit the channel annotations when any of the following are true:
- HCS metadata present in original data AND
--no-hcs
not used- existing
--no-minmax
option is used
The second condition makes sense to me but I think this means the channel annotations should be written when:
--no-hcs
option is used--minmax
option is used(maybe needed for plates with stitched floating point wells?)
Interesting, I could definitely imagine some HCS scenarios which could benefit from storing these annotations. Floating-point wells are one example, long timelapse HCS acquisitions such as https://idr.openmicroscopy.org/webclient/?show=well-1351622 (182 timepoints / well sample) could be another. Thoughts about adding an opt-in flag allowing the user to override the omission conditions described above and unconditionally write the channel annotations if --minmax
has been used in the first step?
My thinking was --minmax
== force the annotations to be written no matter what. --no-minmax
== force the annotations to be omitted. Neither --minmax
nor --no-minmax
== use HCS/no-HCS logic to decide whether to write annotations. I think we are saying the same thing just in a different way?
My thinking was --minmax == force the annotations to be written no matter what. --no-minmax == force the annotations to be omitted. Neither --minmax nor --no-minmax == use HCS/no-HCS logic to decide whether to write annotations. I think we are saying the same thing just in a different way?
Just to clarify, are you talking about a new --[no-]minmax
option defined at the raw2ometiff
level? I was (probably wrongly) assuming you were referring to the --[no-]minmax
option of the bioformats2raw
utility.
Sorry, ignore me. My whole line of thinking yesterday was around turning off the potentially expensive min/max calculation at the bioformats2raw level. No min/max calculation in bioformats2raw then means the channel annotations can't be written here. That's probably not what we actually want though.
63b5cf4 is one way to do this. If there is plate metadata, there will be no channel annotations by default. If you add the new --force-channel-annotations
option, channel annotations will be written even if there is a plate.
This is a tricky one. Nice issue! 😉
Trying to capture my brainstorming from the formats meeting, though it doesn't help with the lack of an API for retrieving min/max values from arbitrary readers, another possible route for injecting this information is to consider everything that doesn't get converted into OME-XML from the intermediate OME-Zarr to be a file annotation in the OME-TIFF. (That leads us to the secondary issue of having file attachments for OME-TIFF, but that's a problem that would be good to solve regardless).
After discussion with @sbesson / @muhanadz / @erindiel / @emilroz / @DavidStirling , closing this in favor of encouraging OME-Zarr usage in cases where min/max data needs to be used.
Agreed it would be nice to explore the file annotation concept, but I don't think we're realistically going to be able to work on that any time soon.
See https://github.com/glencoesoftware/bioformats2raw/pull/160
Discussed with @mgheirat, @muhanadz, and @sbesson that it would be nice to keep any calculated min/max values when converting to OME-TIFF. Since there isn't really a place in the OME schema to do that, this PR uses a
MapAnnotation
linked to eachImage
. TheMapAnnotation
has the namespaceglencoesoftware.com/ngff/rendering
, and a single key (omero
) whose value is the JSON described here: https://ngff.openmicroscopy.org/latest/#omero-md.Namespace, annotation type, etc. all up for debate if anyone has better ideas. Not necessarily expecting this to be merged as-is, it's just a place to start. I thought about using an
XMLAnnotation
instead, like we do for the JSON annotation described here: https://github.com/glencoesoftware/ome-omero-roitool/blob/master/src/main/java/com/glencoesoftware/roitool/OMEOMEROConverter.java#L199.MapAnnotation
seemed less likely to break things that might try to parse anXMLAnnotation
value.It should be possible to convert an input dataset to OME-TIFF with https://github.com/glencoesoftware/bioformats2raw/pull/160 and this PR, and see the linked annotations with
showinf -nopix -noflat -omexml
on the output OME-TIFF. Importing the OME-TIFF to OMERO should allow the annotations to be queried - get theImage
first, then look for linked annotations with theglencoesoftware.com/ngff/rendering
namespace.No tests (yet) because we'd need a released version of bioformats2raw that generates OMERO metadata.