edelooff / sqlalchemy-json

Full-featured JSON type with mutation tracking for SQLAlchemy
http://variable-scope.com/posts/mutation-tracking-in-nested-json-structures-using-sqlalchemy
BSD 2-Clause "Simplified" License
189 stars 34 forks source link

Validator support #21

Open risicle opened 5 years ago

risicle commented 5 years ago

Hi,

Another weakness in sqlalchemy's mutable object support is the fact that @validates validators aren't run when nested objects are mutated, meaning changes can bypass validation - far from ideal. It would be nice if your library also made an attempt at fixing this, making it more of a "complete solution".

One potential pitfall I guess could be issues with expensive validation functions causing poor performance.

risicle commented 5 years ago

Unfortunately it looks like it'd be quite hard work to make the @validates support listen for extra events beyond append, bulk_replace, set & remove. The temptation is override Mutable.change to simply generate set events.

edelooff commented 5 years ago

Triggering set events from a nested change sounds like a pretty aggressive and potentially impactful thing to do by default. However, being able to opt in to behaviour like this might be useful for some cases. API-wise, I think an optional argument on the type might be a good way to do it:

class Article(Base):
    content = Column(NestedMutableJson(validate_on_change=True))

This could then be used to generate the relevant set event, triggering validation. This is assuming that the set event only triggers validation and not a bunch of associated undesired work, which is something to investigate.

risicle commented 5 years ago

I've been experimenting with simply making NestedMutable's choice of dict and list implementations exposed and overridable a la:

class NestedMutable(Mutable):
    """SQLAlchemy `mutable` extension with nested change tracking."""
    MUTABLE_DICT_TYPE = NestedMutableDict
    MUTABLE_LIST_TYPE = NestedMutableList

    @classmethod
    def coerce(cls, key, value):
        """Convert plain dictionary to NestedMutable."""
        if value is None:
            return value
        if isinstance(value, cls):
            return value
        if isinstance(value, dict):
            return cls.MUTABLE_DICT_TYPE.coerce(key, value)
        if isinstance(value, list):
            return cls.MUTABLE_LIST_TYPE.coerce(key, value)
        return super(cls).coerce(key, value)

Which makes it possible to replace those with variants that have had a mixin applied to them. It works but it's complicated slightly by the decision to overload Mutable's .changed(self) method with TrackedObject's .changed(self, message=None, *args) method. Using the same method name means that for a class that's both a Mutable and a TrackedObject it's impossible to override specifically one or the other. Fortunately for what I'm doing, I think it just about works in either case.

edelooff commented 5 years ago

The changed method on the NestedMutableList/NestedMutableDict should be the last in the chain to get hit, resulting in exactly one call regardless of the level of nesting (all objects below it are either TrackedList or TrackedDict).

After some cursory investigation though, I can't see how to get from the mutable wrapper class to the actual column property, to issue the validation. Do you have a code example for that, assuming you've got it to work?

Adding some points where delegates classes can be overridden sounds reasonable regardless, but handling the whole case with a single flag might be nice. That said, my suggested solution above puts settings for the mutable delegation in the SQL type, which is not quite ideal either (and not necessarily accessible by NestedMutable)

risicle commented 5 years ago

The last to get hit is actually the .changed() of Mutable itself, and if you needed to intercept the call after TrackedObject.changed(...) but before Mutable.changed() it's (probably?) impossible to do that - luckily I don't think that's something I needed in the end, so let's ignore that point.

The thing that actually foils me is the way TrackedDict and TrackedList's hijacked mutation methods (__setitem__ et al) descend into the change() call chain before performing their supercall (which actually makes the change take effect) rather than after. The actual "new value" during the changed call chain doesn't ever make it past the first call. A validator function needs to be passed the new value to work with, but this is not available (anywhere) by the time the call has reached the Mutable. If the changed and super calls were the other way around, the validator-calling function would at least be able to pick the new value off the instance itself, it having already been applied.

I am partly working off an old patch set that I worked on to solve this problem a few years ago (and it worked well, but let's not go into why the PR was not applied) https://github.com/alphagov/digitalmarketplace-api/pull/412/files