Detecting migrations for custom get_func implementations

davecoates commented 2 years ago

With the changes in version 3 that creates migrations I've encountered an issue with a custom trigger class that implement get_func to generate the trigger (it's for audit and so the trigger uses the fields currently defined on the model). If the fields on the model change then get_func will be different but it won't currently have migrations created (pgtrigger ls will show it as outdated though).

What's the best way to handle this? I've tried the following, which seems to work, but I'm wondering if I should approach it differently or whether it's something pgtrigger should handle internally:

class PatchedEvent(pghistory.trigger.Event):
    def __init__(self, *args, func=None, **kwargs):
        super().__init__(*args, **kwargs)
        self._init_vals = [args, kwargs, func]

    def get_init_vals(self):
        args, kwargs, func = self._init_vals
        if not func:
            func = self.get_func(self.event_model.pgh_tracked_model)
        return [args, {**kwargs, "func": func}]

    def get_func(self, model):
        ...

wesleykendall commented 2 years ago

@davecoates let me revisit the problem here to shed some light on it (apologies if you already knew all of what I'm about to explain):

Django's migration system serializes things at a point in time so they can be reconstructed. If you pass a model as an argument to a trigger, it will be serialized like this in the migration:

operations=[migrations.AddTrigger(name="something", model="myapp.MyModel")]

The problem with this is when MyModel changes, the serialization of the trigger doesn't change. So the migration system doesn't detect that anything has changed in the trigger.

The best approach is to make it so that the trigger takes arguments that will cause the serialization to change. Your approach should work if the kwargs has all of the fields of the model serialized in it. The automatic trigger serialization code doesn't handle classes with **kwargs in the __init__, which is why you have to fill out this custom get_init_vals.

I believe there's actually a subtle bug in pghistory that could be affecting you though, even with your current updated code. The Snapshot objects in pghistory don't serialize all the fields by default, so it will lead to this problem unless you explicitly list them.

I don't know the full context of what you're tracking with pghistory, so it's hard to say for sure if this affects you. If you aren't explicitly listing models fields in the history tracker, I'd recommend turning on PGTRIGGER_INSTALL_ON_MIGRATE to True for now in your settings to make sure triggers are properly updated after migrations. My plans to address this are:

I can handle *args and **kwargs for custom triggers automatically. It should be straightforward and prevent people from having to do serialization stuff in custom triggers
Next week my plans are to finally blast through a lot of the outstanding pghistory tickets now that pgtrigger has stabilized more. Although I haven't reproduced it, I do think there's a bug with the Snapshot stuff that doesn't completely work with migrations. I'll tag you in an issue over on pghistory once I get around to reproducing it.

Will keep this open for now until I create the proper issues or address it

wesleykendall commented 2 years ago

Just confirmed that there isn't a bug in the snapshot stuff in pghistory. The condition is tracked in the migrations even when no fields are specified, so the trigger continues to be migrated properly.

I still plan to follow up on 1) so that you can use **kwargs. In the meantime an immediate approach you can take is to explicitly list the arguments of the trigger:

def __init__(
        self,
        *,
        name=None,
        operation=None,
        condition=None,
        when=None,
        label=None,
        snapshot=None,
        event_model=None,
        func=None
    ):

I think the above would be a better way of handling it for now, assuming it works for your use case

davecoates commented 2 years ago

Thanks for the explanation - that helps. I'm a bit confused about pghistory Snapshot though - won't this suffer from a stale 'insert' trigger if you add a new field to a model? The 'update' trigger will be fine because condition changes but for insert nothing would change in the arguments passed to pghistory.trigger.Event.

I tried with an example from the docs by decorating a model with @pghistory.track(pghistory.Snapshot("test_model.snapshot"))

The initial migration created fine and after adding a new field the 'update' trigger was dropped and recreated but the insert was left untouched. Have I missed something here?

Apologies if this has progressed into pghistory territory - when I opened this I was thinking about the general case with pgtrigger. As a hypothetical say there was a bug in the trigger code for pgtrigger.SoftDelete that required a change in the string returned from get_func - what would be best way to handle that in regards to generating migrations in projects that were already using the buggy version?

wesleykendall commented 2 years ago

The initial migration created fine and after adding a new field the 'update' trigger was dropped and recreated but the insert was left untouched. Have I missed something here?

No you are absolutely correct! Thanks for bringing this to my attention. I'm dedicating next week to pghistory and will plan on addressing this.

As a hypothetical say there was a bug in the trigger code for pgtrigger.SoftDelete that required a change in the string returned from get_func - what would be best way to handle that in regards to generating migrations in projects that were already using the buggy version?

If I understand, you're effectively asking about the same type of issue uncovered with pghistory, right? That is, if the underlying SQL changes but the class instantiation doesn't change, how does the migration system know to update it?

I think one way for me to solve this is to introduce a hash argument that's supplied to the __init__ of the base trigger class. It's a totally useless argument that users shouldn't supply, but it's there for migration serialization and for the equality operator of those migration objects.

Triggers already have a unique hash that's computed based on the SQL and stored alongside the trigger object (it's how pgtrigger ls knows what is outdated), so it's a pretty easy addition that can help make the migration integration more robust. It should also solve the issue of pghistory without any substantial changes.

wesleykendall commented 2 years ago

I'm going to go ahead and integrate this into django-pgtrigger and mark this as a bug.

Basically what I'm going to do is design a test case around your use case - a custom trigger that implements get_func with no real changes to the __init__. When the function changes, it should result in migrations being re-generated using that hash approach I just mentioned.

I'm going to try to do this in a way so that it doesn't break the API, probably by adjusting the migration operation so that it takes the hash instead of the trigger class taking the hash as an argument. Will see what I'm able to come up with and will let you know!

davecoates commented 2 years ago

If I understand, you're effectively asking about the same type of issue uncovered with pghistory, right? That is, if the underlying SQL changes but the class instantiation doesn't change, how does the migration system know to update it?

Yep exactly that

I think one way for me to solve this is to introduce a hash argument that's supplied to the init of the base trigger class. It's a totally useless argument that users shouldn't supply, but it's there for migration serialization and for the equality operator of those migration objects.

That sounds good and sounds like it would resolve my issue without needing to implement get_init_vals

I'm going to try to do this in a way so that it doesn't break the API, probably by adjusting the migration operation so that it takes the hash instead of the trigger class taking the hash as an argument. Will see what I'm able to come up with and will let you know!

Excellent, thank you

wesleykendall commented 2 years ago

@davecoates this has been fixed in version 4.5. When triggers are serialized to migrations, a different trigger compilation object is serialized instead of the main trigger class. This object is basically just a representation of the raw SQL.

The nice thing about this approach:

Custom trigger classes can be changed without worrying about breaking previous migrations. You can even change the class name of the trigger and it will work fine.
The SQL can be versioned more easily. For example, if something needs to fundamentally change in how triggers are installed one day, previous migrations can still be reversed and reference the old trigger definition.
Issues like what was happening in pghistory are fixed. When the underlying SQL changes, the trigger will be migrated.

One caveat with this approach is that it re-creates all triggers when you make migrations. I didn't have too many ways to get around this, but thankfully I don't see a reason this would need to happen again. You should be able to fake the migration, although it's usually harmless to run it and update all the triggers.

The pgtrigger test suite dynamically makes models/migrations and updates them to verify things work. Along with this, I tested out this change with a few projects of mine. I serialized triggers the old way, switched versions, and then made migrations again. Things look good, so I'm going to mark this as fixed.

davecoates commented 2 years ago

I've briefly tried this and seems to work great, thanks 👍

Opus10 / django-pgtrigger

Detecting migrations for custom get_func implementations #91