edelooff / sqlalchemy-json

Full-featured JSON type with mutation tracking for SQLAlchemy
http://variable-scope.com/posts/mutation-tracking-in-nested-json-structures-using-sqlalchemy
BSD 2-Clause "Simplified" License
189 stars 34 forks source link

Default column value doesn't apply until after flush #26

Closed huntfx closed 3 years ago

huntfx commented 3 years ago

Not sure if it's a bug or just a limitation of SA itself, but I just noticed this. Would this be something you'd happen to know how to fix?

It's not a major issue, but it'd be better if the default value was set automatically after creating the class.

class MyModel(Base):
    data = Column(NestedMutableJson, default={}, nullable=False)

model = MyModel()
session.add(model)
print(model.data)
# None

session.flush()
print(model.data)
# {}
edelooff commented 3 years ago

This is default behaviour of SQLAlchemy. The column's default is assigned only on flush, if the attribute has had no other value assigned to it. Changing this behavior for columns of a particular data type (if at all possible) would almost certainly be a surprise to many.

To solve your problem, broadly speaking, there are two ways to have the data be an empty dict when you need it:

You could add an __init__ method to your model, which sets a default keyword argument before handing off to the default initializer. Something along these lines:

class MyModel(Base):
    data = Column(NestedMutableJson, default={}, nullable=False)

    def __init__(self, **kwds):
        kwds.setdefault("data", {})
        super().__init__(**kwds)

The other approach is by using a hybrid property which you can do all sorts of fun things with. In this case, you'd want a getter that sets and provides the default {} when the attribute itself is still None:

class MyModel(Base):
    _data = Column("data", NestedMutableJson, default={}, nullable=False)

    @hybrid_property
    def data(self):
        if self._data is None:
            self._data = {}
        return self._data

The downside here is that you might be on the hook for also providing a setter and expression and by that point it adds up to quite a bit of code for a minor thing. But if you want to really avoid touching __init__, it's an option.

huntfx commented 3 years ago

Ah thanks for info. I had to double check this as I've never noticed the behaviour, but I think I just never ended up using sqlalchemy that way before.

To be honest I'd run into the issue as I'd originally used an attribute_mapped_collection to link to another model, so I needed to create the item first, before iterating through the data keys and setting the values one by one. It did dawn on me this was a bit stupid now I can pass the full dict in at once, so that's what I've done haha.

I thought your first suggestion would be a great addition to BaseClass.__init__ (as in declarative_base(cls=BaseClass) to automatically set all values based on cls.column.default. Sadly it seems you can't override methods this way, only add new ones.

edelooff commented 3 years ago

@Peter92 I dug a little bit deeper into the possibilities of setting defaults during instance creation, and in an unsurprising twist, it turns out SQLAlchemy provides all the tools needed to make something like that work: Eager defaults for SQLAlchemy models.

huntfx commented 3 years ago

I was about to mention how it was weird that I couldn't find the same article on google, but I just noticed it was you who posted it haha. As it turns out to be a simple concept, I thought I may as well implement your first example as I've got quite a lot of different tables and can't really be bothered cherry picking which columns to use it on. I did a simple timing test and found it's many times faster not using inspect, and instead just directly accessing the columns.

# Slower
mapper = inspect(instance).mapper
for column in mapper.columns:
    attr = mapper.get_property_by_column(column)
    kwds.setdefault(attr.key, column.default.arg)

# Faster
for name, column in instance.__table__.c.items():
    kwds.setdefault(name, column.default.arg)
edelooff commented 3 years ago

I ended up writing that after doing a little bit of digging and tinkering, so thanks for that. I'm glad it would have been helpful, hah.

Using the column name directly is indeed a lot quicker, but goes awry when you have a column name that's different from the attribute on your model (as is the case in the article's example). It's not a particularly common and if you're the only user of it it's fine, but I imagine it might cause some real head scratchers in bigger teams and code bases when that happens.