Open bgervan opened 1 week ago
I have move forward a bit with the actual implementation:
def _get_select(self, event_model):
where_clause = self._get_where_clause(event_model)
(
final_context_columns_clause,
context_column_clause,
context_id_column_clause,
context_join_clause,
annotated_context_columns_clause,
) = self._get_context_clauses(event_model)
pgh_obj_model = event_model._meta.get_field("pgh_obj").remote_field.model
pgh_obj_table = pgh_obj_model._meta.db_table
update_trackers = [
tracker for tracker in event_model.pgh_trackers if tracker.label == "update"
]
curr_data_clause = "row_to_json(_event) AS _curr_data"
prev_data_clause = """
LAG(row_to_json(_event))
OVER (
PARTITION BY _event.pgh_obj_id
ORDER BY _event.pgh_id
) AS _prev_data
"""
pgh_obj_id_column_clause = "pgh_obj_id::TEXT"
if not hasattr(event_model, "pgh_obj_id"):
prev_data_clause = "NULL::JSONB AS _prev_data"
pgh_obj_id_column_clause = "NULL::TEXT AS pgh_obj_id"
elif update_trackers and update_trackers[0].row == "OLD": # Check if Old or New
curr_data_clause = f"""
COALESCE(
LEAD(row_to_json(_event)) OVER (PARTITION BY _event.pgh_obj_id ORDER BY _event.pgh_id),
(SELECT row_to_json(m) FROM {pgh_obj_table} m WHERE m.id = _event.pgh_obj_id)
) AS _curr_data
"""
prev_data_clause = "row_to_json(_event) AS _prev_data"
event_table = event_model._meta.db_table
return f"""
SELECT
CONCAT('{event_model._meta.label}', ':', _pgh_obj_event.pgh_id) AS pgh_slug,
_pgh_obj_event.pgh_id,
_pgh_obj_event.pgh_created_at,
_pgh_obj_event.pgh_label,
{final_context_columns_clause}
_pgh_obj_event.pgh_obj_id,
'{event_model._meta.label}' AS pgh_model,
'{event_model.pgh_tracked_model._meta.label}' AS pgh_obj_model,
(
SELECT JSONB_OBJECT_AGG(filtered.key, filtered.value)
FROM
(
SELECT key, value
FROM JSONB_EACH(_pgh_obj_event._curr_data::JSONB)
) filtered
WHERE filtered.key NOT LIKE 'pgh_%%'
) AS pgh_data,
(
SELECT JSONB_OBJECT_AGG(curr.key, array[prev.value, curr.value])
FROM
(
SELECT key, value
FROM JSONB_EACH(_pgh_obj_event._curr_data::JSONB)
) curr
LEFT OUTER JOIN
(
SELECT key, value
FROM JSONB_EACH(_pgh_obj_event._prev_data::JSONB)
) prev
ON curr.key = prev.key
WHERE curr.key NOT LIKE 'pgh_%%'
AND curr.value != prev.value
AND prev IS NOT NULL
) AS pgh_diff,
_pgh_obj_event.pgh_context_id,
_pgh_obj_event.pgh_context
FROM (
SELECT
pgh_id,
pgh_created_at,
pgh_label,
{curr_data_clause},
{annotated_context_columns_clause}
{prev_data_clause},
{context_id_column_clause},
{context_column_clause},
{pgh_obj_id_column_clause}
FROM {event_table} _event
{context_join_clause}
{where_clause}
ORDER BY _event.pgh_id
) _pgh_obj_event
"""
I can open a PR and update to align with project scopes as needed.
This part is not quite clear for me
if not hasattr(event_model, "pgh_obj_id"):
prev_data_clause = "NULL::JSONB AS _prev_data"
pgh_obj_id_column_clause = "NULL::TEXT AS pgh_obj_id"
@bgervan I'm not following what is "wrong". What is the error? Can you boil it down to a working example to illustrate what results you are seeing and what you are expecting to see?
The aggregate event model is just a way to quickly query diffs, mostly for the admin, but some people rely on this utility in their application code. If you are wanting to fundamentally alter behavior to match the scope of a project you're doing, it would probably be best to inherit the model and override it accordingly unless the changes you want are broadly applicable to many use cases.
If you can clarify if this is a bug or a feature request, that would be helpful, thanks!
I created a sample model with
PGHISTORY_DEFAULT_TRACKERS = (
pghistory.InsertEvent(),
pghistory.UpdateEvent(row=pghistory.Old),
pghistory.DeleteEvent(row=pghistory.Old),
)
Insert: name: Sample
Update 1: name: Sample (update1)
Update. 2: name: Sample (update2)
With the above changes, the diffs are the following respectively:
null
null
{
"name": [
"Sample",
"Sample (update1)"
],
"updated_at": [
"2024-10-10T05:21:29.032954+00:00",
"2024-10-10T05:21:38.175114+00:00"
]
}
So when I updated to Sample (update2)
, I have the diff of the previous change, which is a bug with the pghistory.Old settings. For update1, I receive null, when there is a change.
With my update, the diffs are the following:
In [3]: MyEvents.objects.tracks(example)[0].pgh_diff
In [4]: MyEvents.objects.tracks(example)[1].pgh_diff
Out[4]:
{
'name': ['Sample', 'Sample (update1)'],
'updated_at': [
'2024-10-10T05:21:29.032954+00:00',
'2024-10-10T05:21:38.175114+00:00'
]
}
In [5]: MyEvents.objects.tracks(example)[2].pgh_diff
Out[5]:
{
'name': ['Sample (update1)', 'Sample (update2)'],
'updated_at': [
'2024-10-10T05:21:38.175114+00:00',
'2024-10-10T05:21:43.575616+00:00'
]
}
So for insert it is still NULL (correct) and for the updates, it shows the correct values.
The fix is simply shifts the comparison objects from last and current, to current and next
You are seeing these results because you overrode the default trackers.
If you are tracking inserts and updates of the OLD
row, this means the first two tracked rows in the history table will be identical, which is why you see two null
values first - there are no changes yet.
For example, say you have a row with col_1
.
INSERT
of col_1=a
will produce tracked history of col_1=a
UPDATE
of col_1=b
will track the OLD
row before the change and produce a tracked history of col_1=a
UPDATE
of col_1=c
will track the OLD
row before the change and produce a tracked history of col_1=b
So, in the diff, you'll see null
, null
, and then col_1=a->b
The aggregated event utility is just looking at changes in the history rows. It doesn't understand the context of the trackers that have been set up
I understood what happens, that is what my fix trying to update, checking the tracker if its NEW or OLD and return the diff accordingly.
You don't consider it as a bug when the user uses the row=pghistory.Old? I found a couple of issue with my fix, but fixed since the first implementation.
If you are tracking OLD
rows on update, your snapshot table lags behind the current table. The aggregate queryset that shows diffs only examines snapshot rows, so what you're seeing is expected behavior.
One alternative approach is this:
PGHISTORY_DEFAULT_TRACKERS = (
pghistory.InsertEvent(),
pghistory.UpdateEvent(row=pghistory.New),
pghistory.DeleteEvent(row=pghistory.Old),
)
If you do this, the example from my previous comment now looks like this:
I.e. your snapshot table will actually show differences in the first two snapshot entries.
Another possibility is to filter out inserts from the snapshots table or not track them at all, but your diffs will always lag behind the current version of the row
To summarize, the aggregate diff utility is just showing diffs from the snapshot table. What's been reported here is expected behavior. Altering it would introduce regressions for others and require an API break. I'm open to altering the behavior, but this is pretty fundamental to how things are architected. If we try to pull in current values not from snapshot tables, it opens up a can of worms for other use cases
Hi, first of all, great package.
My use case: I am storing the old data, to make it easier to revert to something. I am storing transaction_id in context, so I can revert all related changes in a request easily. (At least this is the goal)
If I am using the following settings:
the pgh_diff is wrong, as it fetch the current and previous data of the event. I have changed the raw sql to reach my goal to the this:
Important part is:
Also added pgh_old_data, so it is easier to debug. In this setup:
Can we add changes like this optionally? As I understand, all these affect only the admin, and relevant only when using the pghistory.Old
The phg_diff is important, but the diff_to_real is also nice to have, as with that, we can show the diffs between the revert and current state before the user click on the revert.
EDIT: Added important sql sections separately for easier reading