Opus10 / django-pghistory

Track historical events to Django models using Postgres triggers.
https://django-pghistory.readthedocs.io
BSD 3-Clause "New" or "Revised" License
375 stars 41 forks source link

Question regarding using Retrieving and Joining AggregateEvent Metadata from the doc #33

Closed xjlin0 closed 3 months ago

xjlin0 commented 2 years ago

Hi team, thanks again for creating such a great package, it's been a pleasure to develop with it. Here is an observation regarding " Retrieving and Joining AggregateEvent Metadata" in the doc.

https://django-pghistory.readthedocs.io/en/latest/extras.html#retrieving-and-joining-aggregateevent-metadata When I copying the example pointing auth.User, makemigration complains:

SystemCheckError: System check identified some issues:

ERRORS:
users.CustomAggregateEvent.user: (fields.E301) Field defines a relation with the model 'auth.User', which has been swapped out.
    HINT: Update the relation to point at 'settings.AUTH_USER_MODEL'.

Thus is my user model modifying the doc example pointing settings.AUTH_USER_MODEL, which is my User model.

import pghistory
from pghistory.models import BaseAggregateEvent
from django.contrib.auth.models import AbstractUser

class User(AbstractUser):
    name = CharField("Name of User", blank=True, max_length=255)
    #...

class CustomAggregateEvent(BaseAggregateEvent):
    user = models.ForeignKey(User, on_delete=models.DO_NOTHING, null=True)
    url = models.TextField(null=True)

    class Meta:
        managed = False

makemigrations, migrate, createsuperuser and seeding data seems fine, however, it complains ValueError: Must use .target() to target an object for event aggregation in the django shell:

Python 3.9.13 (main, May 18 2022, 02:18:18)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: UserAggregateEvent.objects.annotate(email=F('user__email'))
Out[1]: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /usr/local/lib/python3.9/site-packages/IPython/core/formatters.py:707, in PlainTextFormatter.__call__(self, obj)
    700 stream = StringIO()
    701 printer = pretty.RepresentationPrinter(stream, self.verbose,
    702     self.max_width, self.newline,
    703     max_seq_length=self.max_seq_length,
    704     singleton_pprinters=self.singleton_printers,
    705     type_pprinters=self.type_printers,
    706     deferred_pprinters=self.deferred_printers)
--> 707 printer.pretty(obj)
    708 printer.flush()
    709 return stream.getvalue()

File /usr/local/lib/python3.9/site-packages/IPython/lib/pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File /usr/local/lib/python3.9/site-packages/IPython/lib/pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File /usr/local/lib/python3.9/site-packages/django/db/models/query.py:256, in QuerySet.__repr__(self)
    255 def __repr__(self):
--> 256     data = list(self[:REPR_OUTPUT_SIZE + 1])
    257     if len(data) > REPR_OUTPUT_SIZE:
    258         data[-1] = "...(remaining elements truncated)..."

File /usr/local/lib/python3.9/site-packages/django/db/models/query.py:262, in QuerySet.__len__(self)
    261 def __len__(self):
--> 262     self._fetch_all()
    263     return len(self._result_cache)

File /usr/local/lib/python3.9/site-packages/django/db/models/query.py:1324, in QuerySet._fetch_all(self)
   1322 def _fetch_all(self):
   1323     if self._result_cache is None:
-> 1324         self._result_cache = list(self._iterable_class(self))
   1325     if self._prefetch_related_lookups and not self._prefetch_done:
   1326         self._prefetch_related_objects()

File /usr/local/lib/python3.9/site-packages/django/db/models/query.py:51, in ModelIterable.__iter__(self)
     48 compiler = queryset.query.get_compiler(using=db)
     49 # Execute the query. This will also fill compiler.select, klass_info,
     50 # and annotations.
---> 51 results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
     52 select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
     53                                           compiler.annotation_col_map)
     54 model_cls = klass_info['model']

File /usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py:1162, in SQLCompiler.execute_sql(self, result_type, chunked_fetch, chunk_size)
   1160 result_type = result_type or NO_RESULTS
   1161 try:
-> 1162     sql, params = self.as_sql()
   1163     if not sql:
   1164         raise EmptyResultSet

File /usr/local/lib/python3.9/site-packages/pghistory/models.py:509, in AggregateEventQueryCompiler.as_sql(self, *args, **kwargs)
    505 base_sql, base_params = super().as_sql(*args, **kwargs)
    507 # Create the CTE that will be queried and insert it into the
    508 # main query
--> 509 cte = self.get_aggregate_event_cte()
    511 return cte + base_sql, base_params

File /usr/local/lib/python3.9/site-packages/pghistory/models.py:480, in AggregateEventQueryCompiler.get_aggregate_event_cte(self)
    478 obj = self.query.target
    479 if not obj:
--> 480     raise ValueError('Must use .target() to target an object for event aggregation')
    482 event_models = self.query.across
    483 cls = self._class_for_target(obj)

ValueError: Must use .target() to target an object for event aggregation

Thanks for any suggestions.

wesleykendall commented 2 years ago

When I copying the example pointing auth.User, makemigration complains:

It sounds like I need to update the docs to use settings.AUTH_USER_MODEL instead of "auth.User". That would prevent this particular warning from happening

makemigrations, migrate, createsuperuser and seeding data seems fine, however, it complains ValueError: Must use .target() to target an object for event aggregation in the django shell:

The aggregate event model only works across one object at a time right now, hence why it forces you to use target() to target an object. Can you elaborate on what you were trying to achieve?

xjlin0 commented 2 years ago

Hi Wesley, I am wondering if I can use aggregate event as a "master document collection". For example there are three classes: People, Address and Contact. When app user changes a people's contact or address, the app user expects to see "User" history changes too, although the change was only on the the contact or address level. Just like when I change telephone number in my credit card app, I have no idea what's the class behind, and I will think "my data changed".

To deal with such scenario, we can mark all three class as a "collection", and when any of the class changes, the history of the "collection" changes too, which fits app user's expectation.

Would aggregate event be function as such "collection" too? thanks!

wesleykendall commented 2 years ago

@xjlin0 currently the .target() method is designed to do what you described. If, for example, address and contact foreign key to people, changes to those models will be aggregated when doing .target(people_object). I.e. you will see snapshot models for all three models if any foreign keys to the people object are detected.

I think this describes your use case, right? You need to aggregate historical changes that reference a particular object? Currently the code only aggregates models that directly foreign key to the targeted model.

FYI I'm overhauling the aggregate event helper models to make them more flexible and easier to query. I'm planning to have a new version ready by beginning of next week. The old AggregateEvent model is going to be deprecated, and a new Events model will be there. It currently looks like this:

class Events(models.Model):
    """
    A proxy model for aggregating events together across tables and
    rendering diffs
    """
    # A surrogate primary key that is composed of the event table and the event PK
    pgh_pk = models.TextField(primary_key=True)

    # The primary key of the event table
    pgh_id = models.BigIntegerField()
    # When the event was created
    pgh_created_at = models.DateTimeField(auto_now_add=True)
    # The label of the event
    pgh_label = models.TextField(help_text="The event label.")
    # The table of the event model
    pgh_table = models.CharField(
        max_length=64, help_text="The table under which the event is stored."
    )
    # The raw data of the event row
    pgh_data = PGHistoryJSONField(help_text="The raw data of the event row.")
    # The changes between this row and the last event with the same label and object
    pgh_diff = PGHistoryJSONField(
        help_text="The diff between the previous event and the current event."
    )
    # The UUID of the context, if any, that was tracked
    pgh_context_id = models.UUIDField(null=True, help_text="The ID associated with the context.")
    # The context that was tracked
    pgh_context = models.JSONField(
        "pghistory.Context",
        null=True,
        help_text="The context associated with the event.",
    )
    # The table of the primary event object. 
    pgh_obj_table = models.CharField(
        max_length=64, help_text="The table under which the primary object is stored."
    )
    # The pk of the primary event object
    pgh_obj_id = models.TextField(null=True, help_text="The ID of the primary object.")

The view is pretty similar to what exists now with some additional fields. Here's some key differences:

  1. Events.objects.all() can be called and return all events across all tables. You are no longer required to filter on an individual event.
  2. Events.objects.across(EventModel1, EventModel2) can be used to limit the search space to event models
  3. Events.objects.tracks(primary_obj1, primary_obj2) can be used to limit the search space to event models that directly track the provided objects.
  4. Events.objects.references(obj1, obj2) can be used to limit the search space to event models that foreign key to the provided objects. This is the same as the .target method from the old version, and I think it fits your use case. I can consider adding a depth argument to extend the depth of the search and joins too.
  5. Just like the old version, you can subclass the event model and add fields that directly reference objects in the context if you want to more easily join those objects. For example, joining the authenticated user.

The reason I'm going with this approach in the later version is because I'm building a direct admin view on top of events to address some of the Django admin integration issues.

Anyways, didn't mean to blast you with this message, but if you have any feedback or suggestions for the aggregate event API let me know!

wesleykendall commented 3 months ago

As a follow-up, this change was released and documented in this section a while ago.