Open d-v-b opened 5 days ago
I'm not sure if it's related, but if in place mutation is necessary for the needed extensibility, I'd value that over hashability.
I think the recommendation for extensions would be to use subclassing, e.g. class MyAxis(Axis): ...
. I think the overlap with mutability is small, but If anything, immutable data models would encourage subclassing, because users would be structurally discouraged from adding dynamic, undocumented fields to the core data models.
If we made our classes frozen, we would need to ensure that it's easy to make a copy with new, validated, attributes. We can do this with the following design
class Axis(Base):
"""
Model for an element of `Multiscale.axes`.
See https://ngff.openmicroscopy.org/0.4/#axes-md.
"""
name: str
type: str | None = None
unit: str | None = None
def with_name(name: str):
"""
return a copy of the axis with a new name
"""
return type(self)(**self.model_dump(exclude={'name'}), name=name)
def with_type(name: str):
"""
return a copy of the axis with a new type
"""
return type(self)(**self.model_dump(exclude={'type'}), type=type)
ax = Axis(name='foo', type='space')
# this will make a new axis with name 'bar', and type time
new_ax = ax.with_name('bar').with_type('time')
obviously we would complete this with with_unit
, if people like the design
I'm 👎 on having to manually write those with_*
methods (although they could probably look a bit nicer if they were classmethods?). Doesn't pydantic provide a nice way to do this?
I'm happy to write them, and I they have to be instance methods because they need the original values
pydantic has model_copy
that provides options to create a copy with specific field(s) updated.
In [12]: Axis(name='x').model_copy(update={'name': 'Y'})
Out[12]: Axis(name='Y', type=None, unit=None)
however it doesn't perform validation:
In [15]: Axis(name='x').model_copy(update={'name': ()})
Out[15]: Axis(name=(), type=None, unit=None)
when I want immutability, but also want the convenience of easy updates, I often add a replace()
method to my base class:
class FrozenModel(BaseModel):
model_config = {'frozen': True}
def replace(self, **kwargs: Any) -> "Self":
"""Return a new instance replacing specified kwargs with new values."""
return type(self).model_validate({**self.model_dump(), **kwargs})
and then if you want model specific type hinting on each subclass, one option is to use kwargs: Unpack[SomeTypedDict]
. Note, however, model_dump() recurses all the way into a nested model, and if you only intend to allow swapping of top level fields... then you can save some cycles with:
d = {k: getattr(self, k) for k in self.model_fields}
return type(self).model_validate({**d, **kwargs})
I've generally been happy when I've made things immutable... particularly when they are objects may be passed as arguments to various listeners/consumers. So I'd certainly try to make them immutable, but yeah, like @joshmoore said, if it really gets in the way then meh
thanks @tlambert03 those examples are super helpful! I love that top-level replace
method, that's exactly the kind of thing we could wrap with with_{field}
if we do go in that direction.
I have a test in
pydantic-ome-ngff
that checks if my multiscale model is hashable. I don't remember why I added this test, but evidently it can be useful for these metadata models to be hashable, e.g. so that we can store them in python sets.Making metadata models hashable requires using frozen pydantic models / dataclasses. We previously decided against using frozen models, because we wanted to allow in-place mutation of those models. How do we weigh the value of in-place mutation vs the advantages of having hashable models? (Maybe we think in place mutation is much more useful...)