glencoesoftware / bioformats2raw

Bio-Formats image file format to raw format converter
GNU General Public License v2.0
77 stars 35 forks source link

Fix OME-NGFF metadata when writing flattened layout #131

Closed sbesson closed 2 years ago

sbesson commented 2 years ago

See https://forum.image.sc/t/dropping-the-top-layer-in-ome-zarr-data-without-losing-multiscales-metadata/63286

As originally suggested in https://github.com/glencoesoftware/bioformats2raw/issues/108#issuecomment-887579778 and captured in the README, setting the --scale-format-string to %2$d/ allows to flatten the hierarchy created bioformats2raw to match the layout of OME-NGFF multiscale images. Using bioformat2raw 0.4.0, this command results in the absence of the multiscales metadata from the top-level group:

(zarr_dev) [sbesson@pilot-zarr2-dev ~]$ bioformats2raw test.fake test.zarr --scale-format-string '%2$d/' &&  cat test.zarr/.zattrs
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp8729792479941275324/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
{
  "bioformats2raw.layout" : 3
}

After investigation, this is due to the logic in https://github.com/glencoesoftware/bioformats2raw/blob/master/src/main/java/com/glencoesoftware/bioformats2raw/Converter.java#L1483 which fails to handle this particular resolutionString and lead to the following resolution:

2022-02-15 09:50:46,281 [main] DEBUG c.g.bioformats2raw.Converter -   seriesString = 0
2022-02-15 09:50:46,281 [main] DEBUG c.g.bioformats2raw.Converter -   resolutionString = 0/

instead of

2022-02-15 10:30:22,690 [main] DEBUG c.g.bioformats2raw.Converter -   seriesString = 
2022-02-15 10:30:22,690 [main] DEBUG c.g.bioformats2raw.Converter -   resolutionString = 0

As noted on the forum post, using the current release of the tool, dropping the trailing slash produces a different output which preservesx the top-level Zarr/OME-NGFF metadata.

(zarr_dev) [sbesson@pilot-zarr2-dev ~]$ bioformats2raw test.fake test2.zarr --scale-format-string '%2$d' &&  cat test2.zarr/.zattrs
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp5978412340746156458/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
{
  "multiscales" : [
    {
      "metadata" : {
        "method" : "loci.common.image.SimpleImageScaler",
        "version" : "Bio-Formats 6.8.0"
      },
      "datasets" : [
        {
          "path" : "0"
        },
        {
          "path" : "1"
        }
      ],
      "version" : "0.2"
    }
  ]
}

However, this value conflicts with the recommendation at the bottom of https://github.com/glencoesoftware/bioformats2raw/tree/v0.4.0#--scale-format-string.

At minimum 8fb26372b1fc2772123b098ec52acd3fcf741022 expands the existing unit test (without the trailing slash) to capture the current expectation for the second command and check the presence of a Zarr group with multiscales metadata.

This PR will need at least one extra commit but there are two options:

@melissalinkert @chris-allan let me know what you are happy to support moving forward and I can push updates accordingly

melissalinkert commented 2 years ago

Thanks @sbesson. I think the second option plus README updates makes most sense long-term.

sbesson commented 2 years ago

:+1: regarding the warning in the readme, do you roughly remember under which conditions the absence of trailing slash in the scale format string would cause an error ?

sbesson commented 2 years ago

Also one related question regarding the semantics of bioformats2raw.layout. My assumption is that it should match the default hierarchy created by bioformats2raw. So in case --scale-format-string '%2$d/' is passed, this key/value can safely be omitted from the top-level group metadata? The alternative would be to also store it alongside the multiscales metadata.

melissalinkert commented 2 years ago

Kind of wishing I'd been more descriptive in https://github.com/glencoesoftware/bioformats2raw/pull/112/commits/ef95be9fca32835853acffc2d1e12c06c1592499. I think the context was running the commands in #108 and #109?

bioformats2raw.layout is just a version number; raw2ometiff will complain if it is missing, but it should complain no matter what if --scale-format-string is used. I don't think it's a problem to just omit it.

sbesson commented 2 years ago

Pushed two commits adding coverage for both forms of the scale format string in the tests and handling the case of trailing slash.

joshmoore commented 2 years ago

Likely outwit this PR, I wonder if a short-cut for the flattening would be in scope.

melissalinkert commented 2 years ago

Last 2 commits are fine with me. Extra option captured as https://github.com/glencoesoftware/bioformats2raw/issues/132. Think this just needs final review and merge from @chris-allan.