should our data models be hashable

d-v-b commented 5 days ago

I have a test in pydantic-ome-ngff that checks if my multiscale model is hashable. I don't remember why I added this test, but evidently it can be useful for these metadata models to be hashable, e.g. so that we can store them in python sets.

Making metadata models hashable requires using frozen pydantic models / dataclasses. We previously decided against using frozen models, because we wanted to allow in-place mutation of those models. How do we weigh the value of in-place mutation vs the advantages of having hashable models? (Maybe we think in place mutation is much more useful...)

joshmoore commented 5 days ago

I'm not sure if it's related, but if in place mutation is necessary for the needed extensibility, I'd value that over hashability.

d-v-b commented 5 days ago

I think the recommendation for extensions would be to use subclassing, e.g. class MyAxis(Axis): .... I think the overlap with mutability is small, but If anything, immutable data models would encourage subclassing, because users would be structurally discouraged from adding dynamic, undocumented fields to the core data models.

d-v-b commented 5 days ago

If we made our classes frozen, we would need to ensure that it's easy to make a copy with new, validated, attributes. We can do this with the following design


class Axis(Base):
    """
    Model for an element of `Multiscale.axes`.

    See https://ngff.openmicroscopy.org/0.4/#axes-md.
    """

    name: str
    type: str | None = None
    unit: str | None = None

    def with_name(name: str):
        """
        return a copy of the axis with a new name
        """
        return type(self)(**self.model_dump(exclude={'name'}), name=name)

    def with_type(name: str):
        """
        return a copy of the axis with a new type
        """
        return type(self)(**self.model_dump(exclude={'type'}), type=type)

ax = Axis(name='foo', type='space')
# this will make a new axis with name 'bar', and type time
new_ax = ax.with_name('bar').with_type('time')

obviously we would complete this with with_unit, if people like the design

dstansby commented 5 days ago

I'm 👎 on having to manually write those with_* methods (although they could probably look a bit nicer if they were classmethods?). Doesn't pydantic provide a nice way to do this?

d-v-b commented 5 days ago

I'm happy to write them, and I they have to be instance methods because they need the original values

tlambert03 commented 4 days ago

pydantic has model_copy that provides options to create a copy with specific field(s) updated.

In [12]: Axis(name='x').model_copy(update={'name': 'Y'})
Out[12]: Axis(name='Y', type=None, unit=None)

however it doesn't perform validation:

In [15]: Axis(name='x').model_copy(update={'name': ()})
Out[15]: Axis(name=(), type=None, unit=None)

when I want immutability, but also want the convenience of easy updates, I often add a replace() method to my base class:

class FrozenModel(BaseModel):
    model_config = {'frozen': True}

    def replace(self, **kwargs: Any) -> "Self":
        """Return a new instance replacing specified kwargs with new values."""
        return type(self).model_validate({**self.model_dump(), **kwargs})

and then if you want model specific type hinting on each subclass, one option is to use kwargs: Unpack[SomeTypedDict]. Note, however, model_dump() recurses all the way into a nested model, and if you only intend to allow swapping of top level fields... then you can save some cycles with:

        d = {k: getattr(self, k) for k in self.model_fields}
        return type(self).model_validate({**d, **kwargs})

I've generally been happy when I've made things immutable... particularly when they are objects may be passed as arguments to various listeners/consumers. So I'd certainly try to make them immutable, but yeah, like @joshmoore said, if it really gets in the way then meh

d-v-b commented 4 days ago

thanks @tlambert03 those examples are super helpful! I love that top-level replace method, that's exactly the kind of thing we could wrap with with_{field} if we do go in that direction.

BioImageTools / ome-zarr-models-py

should our data models be hashable #24