Open krokicki opened 2 weeks ago
I think that a Group should never need a multiscales attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group with multiscales, so maybe that is a better way to model it.
Maybe this comes down to me making a bad naming decisions -- the multiscale.Group
class is designed to model exactly the structure described for a multiscale image in the OME-NGFF spec, i.e. a zarr group with attributes that contains a multiscales
attribute, with a particular structure, etc. By contrast, the zarr group created by multiscales2raw
is not a multiscale group, so multiscale.Group
does not model it. To model a zarr group that contains OME-NGFF groups or non-ome-ngff groups, I would do something like this:
# /// script
# requires-python = ">=3.9"
# dependencies = [
# "pydantic-ome-ngff",
# "fsspec[s3]",
# ]
# ///
from typing import Any
from pydantic_ome_ngff.v04.multiscale import Group
from pydantic_zarr.v2 import GroupSpec
import zarr
url = "s3://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr"
zgroup = zarr.open(url)
# a model of a zarr group with any attributes that could contain any zarr group OR an ome-ngff multiscale group
ContainsOmeGroup = GroupSpec[Any, GroupSpec | Group]
group_model = ContainsOmeGroup.from_zarr(zgroup)
It seems like pydantic handles the union properly in my case: group_model.members
has 2 elements, one of which is a GroupSpec
and the other is a multiscale.Group
I do think my choice of the name Group
was unfortunate. Do you think MultiscaleGroup
would make things more clear?
and in case it wasn't clear, multiscale.Group
is just a subclass of GroupSpec
, with some additional validation logic the ensures that the group attributes and the array members are consistent
Yes, MultiscaleGroup
would be better naming. I think what would be really nice is a Group
(or maybe OmeZarr
) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.
I think what would be really nice is a Group (or maybe OmeZarr) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.
That's a cool idea. I'm not sure there's a simple way to define this is as a generic pydantic model, i.e. to get the behavior you want from Model.from_zarr
(I will keep thinking about this though), but it would definitely be straightforward to create a function that produces a GroupSpec
instance where all sub-groups are either vanilla GroupSpecs
or instances of multiscale.Group
.
@krokicki take a look at https://github.com/JaneliaSciComp/pydantic-ome-ngff/pull/37, in particular the docs changes -- I put your specific use case in as an example in the docs. let me know if there's anything I should add to remove there.
Nice! That's most of what I wanted. The only thing left is to make a from_zarr method that creates those from disk representations.
The only thing left is to make a from_zarr method that creates those from disk representations.
The example in the docs does a full round-trip to and from disk, albeit just for a hierarchy defined as a Zarr group that contains OME-NGFF groups OR regular zarr groups. Here's a commented, abridged form of the relevant part of the docs example:
# data structure in memory
multi_image_group = GroupOfMultiscales(members=groups)
# memory -> disk
zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
# disk -> memory
GroupOfMultiscales.from_zarr(zgroup)
let me know if I should make this more clear in the docs.
So the specific use case that motivated you to open this issue should be addressed in #37, but the general problem of defining a model of a Zarr group that could contain an OME-NGFF group at any level remains open.
Actually, we can do with self-referential types. Here I amend my original example to show how to express the general case of a zarr hierarchy which might contains ome-ngff groups at any level:
# /// script
# requires-python = ">=3.10"
# dependencies = [
# "pydantic-ome-ngff==0.6.0",
# "fsspec[s3]",
# ]
# ///
from typing import Any, Union
from pydantic_ome_ngff.v04 import MultiscaleGroup, Axis
from pydantic_zarr.v2 import GroupSpec, ArraySpec
import zarr
import numpy as np
# this class is self-referential
class ContainsOmeGroup(GroupSpec[Any, Union[MultiscaleGroup, GroupSpec, ArraySpec, "ContainsOmeGroup"]]):
...
axes = [Axis(name='x', type='space'), Axis(name='y', type='space')]
m_group_a = MultiscaleGroup.from_array_props(
dtype=np.dtype('uint8'),
shapes = [(10,10)],
paths=['s0'],
axes=axes,
scales=[[1,1]],
translations=[[0,0]],
order='C')
m_group_c = MultiscaleGroup.from_array_props(
dtype=np.dtype('uint16'),
shapes = [(20,20)],
paths=['s0'],
axes=axes,
scales=[[10,10]],
translations=[[5,5]],
order='C')
# this is a sub-group that contains a multiscale group
group_b = GroupSpec(attributes={'foo': 10}, members={'b_c': m_group_c})
multi_image_group = ContainsOmeGroup(members={'a': m_group_a, 'b': group_b})
store = zarr.MemoryStore()
zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
g = ContainsOmeGroup.from_zarr(zgroup)
print(f"{type(g.members['a'])=}")
print(f"{type(g.members['b'].members['b_c'])=}")
"""
type(g.members['a'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
type(g.members['b'].members['b_c'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
"""
When you use bioformats2raw it produces one or more images at the root: https://ngff.openmicroscopy.org/0.4/#bf2raw
When I try to open one of these zarrs:
it results in an error:
I think that a
Group
should never need amultiscales
attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group withmultiscales
, so maybe that is a better way to model it.In any case, I think it should be possible to parse any valid OME-Zarr and it should just fall back to standard Zarr constructs whenever a concept is missing. For example, even if it doesn't explicitly model
Plate
andWell
as classes, they could still be expressed asGroup
objects.