JaneliaSciComp / pydantic-ome-ngff

Pydantic models for OME-NGFF
BSD 3-Clause "New" or "Revised" License
20 stars 3 forks source link

Support for bioformats2raw transitional layout, etc. #36

Open krokicki opened 2 weeks ago

krokicki commented 2 weeks ago

When you use bioformats2raw it produces one or more images at the root: https://ngff.openmicroscopy.org/0.4/#bf2raw

When I try to open one of these zarrs:

from pydantic_ome_ngff.v04.multiscale import Group
import zarr
url = "s3://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr"
zgroup = zarr.open(url)
group_model = Group.from_zarr(zgroup)

it results in an error:

KeyError: 'Failed to find mandatory `multiscales` key in the attributes of the Zarr group at <zarr.storage.FSStore object at 0x7fd55c0f0160>://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr://.'

I think that a Group should never need a multiscales attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group with multiscales, so maybe that is a better way to model it.

In any case, I think it should be possible to parse any valid OME-Zarr and it should just fall back to standard Zarr constructs whenever a concept is missing. For example, even if it doesn't explicitly model Plate and Well as classes, they could still be expressed as Group objects.

d-v-b commented 2 weeks ago

I think that a Group should never need a multiscales attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group with multiscales, so maybe that is a better way to model it.

Maybe this comes down to me making a bad naming decisions -- the multiscale.Group class is designed to model exactly the structure described for a multiscale image in the OME-NGFF spec, i.e. a zarr group with attributes that contains a multiscales attribute, with a particular structure, etc. By contrast, the zarr group created by multiscales2raw is not a multiscale group, so multiscale.Group does not model it. To model a zarr group that contains OME-NGFF groups or non-ome-ngff groups, I would do something like this:

# /// script
# requires-python = ">=3.9"
# dependencies = [
#   "pydantic-ome-ngff",
#   "fsspec[s3]",
# ]
# ///

from typing import Any
from pydantic_ome_ngff.v04.multiscale import Group
from pydantic_zarr.v2 import GroupSpec

import zarr
url = "s3://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr"
zgroup = zarr.open(url)
# a model of a zarr group with any attributes that could contain any zarr group OR an ome-ngff multiscale group
ContainsOmeGroup = GroupSpec[Any, GroupSpec | Group]
group_model = ContainsOmeGroup.from_zarr(zgroup)

It seems like pydantic handles the union properly in my case: group_model.members has 2 elements, one of which is a GroupSpec and the other is a multiscale.Group

I do think my choice of the name Group was unfortunate. Do you think MultiscaleGroup would make things more clear?

d-v-b commented 2 weeks ago

and in case it wasn't clear, multiscale.Group is just a subclass of GroupSpec, with some additional validation logic the ensures that the group attributes and the array members are consistent

krokicki commented 2 weeks ago

Yes, MultiscaleGroup would be better naming. I think what would be really nice is a Group (or maybe OmeZarr) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.

d-v-b commented 2 weeks ago

I think what would be really nice is a Group (or maybe OmeZarr) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.

That's a cool idea. I'm not sure there's a simple way to define this is as a generic pydantic model, i.e. to get the behavior you want from Model.from_zarr (I will keep thinking about this though), but it would definitely be straightforward to create a function that produces a GroupSpec instance where all sub-groups are either vanilla GroupSpecs or instances of multiscale.Group.

d-v-b commented 2 weeks ago

@krokicki take a look at https://github.com/JaneliaSciComp/pydantic-ome-ngff/pull/37, in particular the docs changes -- I put your specific use case in as an example in the docs. let me know if there's anything I should add to remove there.

krokicki commented 2 weeks ago

Nice! That's most of what I wanted. The only thing left is to make a from_zarr method that creates those from disk representations.

d-v-b commented 2 weeks ago

The only thing left is to make a from_zarr method that creates those from disk representations.

The example in the docs does a full round-trip to and from disk, albeit just for a hierarchy defined as a Zarr group that contains OME-NGFF groups OR regular zarr groups. Here's a commented, abridged form of the relevant part of the docs example:

# data structure in memory
multi_image_group = GroupOfMultiscales(members=groups)
# memory -> disk
zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
# disk -> memory
GroupOfMultiscales.from_zarr(zgroup)

let me know if I should make this more clear in the docs.

So the specific use case that motivated you to open this issue should be addressed in #37, but the general problem of defining a model of a Zarr group that could contain an OME-NGFF group at any level remains open.

d-v-b commented 2 weeks ago

Actually, we can do with self-referential types. Here I amend my original example to show how to express the general case of a zarr hierarchy which might contains ome-ngff groups at any level:


# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "pydantic-ome-ngff==0.6.0",
#   "fsspec[s3]",
# ]
# ///

from typing import Any, Union
from pydantic_ome_ngff.v04 import MultiscaleGroup, Axis
from pydantic_zarr.v2 import GroupSpec, ArraySpec
import zarr
import numpy as np

# this class is self-referential
class ContainsOmeGroup(GroupSpec[Any, Union[MultiscaleGroup, GroupSpec, ArraySpec, "ContainsOmeGroup"]]):
    ...

axes = [Axis(name='x', type='space'), Axis(name='y', type='space')]

m_group_a = MultiscaleGroup.from_array_props(
    dtype=np.dtype('uint8'),
    shapes = [(10,10)],
    paths=['s0'],
    axes=axes,
    scales=[[1,1]],
    translations=[[0,0]],
    order='C')

m_group_c = MultiscaleGroup.from_array_props(
    dtype=np.dtype('uint16'),
    shapes = [(20,20)],
    paths=['s0'],
    axes=axes,
    scales=[[10,10]],
    translations=[[5,5]], 
    order='C')

# this is a sub-group that contains a multiscale group
group_b = GroupSpec(attributes={'foo': 10}, members={'b_c': m_group_c})
multi_image_group = ContainsOmeGroup(members={'a': m_group_a, 'b': group_b})
store = zarr.MemoryStore()

zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
g = ContainsOmeGroup.from_zarr(zgroup)
print(f"{type(g.members['a'])=}")
print(f"{type(g.members['b'].members['b_c'])=}")
"""
type(g.members['a'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
type(g.members['b'].members['b_c'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
"""