PaulGilmartin / django-pgpubsub

A distributed task processing framework for Django built on top of the Postgres NOTIFY/LISTEN protocol.
Other
245 stars 12 forks source link

Stores non serialized json in the notification payload field with simple migration #66

Closed romank0 closed 9 months ago

romank0 commented 1 year ago

This is a fix for https://github.com/Opus10/django-pgpubsub/issues/65.

The fix is pretty straightforward. The only open question is a backward compatibility and the migration to the version with the fix.

The fix changes the format of the payload field (it does not change the type as jsonb is the correct type). That means that the notifications produced by the existing version of the library are not compatible with the new code.

There several options how to approach the migration.

Option 1: simple migration

Pros:

Option 2: two releases migration

In the first release the version that produces new format of the payload but can process both the old and the new format of the payloads in the listener is implemented.

In the second release the support of the old format is removed.

This PR

The first commit implements option 1.

The second commit implements the first phase of the option 2.

For my purposes the second option is better as in safer. May be there are some other better options. I've created this PR to start a discussion.

PaulGilmartin commented 9 months ago

@romank0 Looks good. Thanks for making the effort of making it backwards compatible. In my understanding, in summary what's happening here is:

Does that sound correct or have I misunderstood?

PaulGilmartin commented 9 months ago

I also just updated the CI for this project. If you rebase with master you should see the result.

romank0 commented 9 months ago

... In my understanding, in summary what's happening here is:

  • It would be better if the payload sent in the notification was sent in JSONB format, rather than string format.

Not really. pg_notify itself can only send text payloadd and this MR does not change that. We always send json object serialized to string. The problem is how the data is stored in Notification payload field in the database. I've described this briefly in this comment https://github.com/Opus10/django-pgpubsub/pull/48#discussion_r1197592076 and it seems I need to elaborate a bit.

In the current master we have the following code that is relevant to this issue:

            payload := json_build_object(
                'app', '{model._meta.app_label}',
                'model', '{model.__name__}',
                'old', row_to_json(OLD),
                'new', row_to_json(NEW)
              );

and then:

            INSERT INTO pgpubsub_notification (channel, payload)
            VALUES ('{self.name}', to_json(payload::text));

In the first snippet above json_build_object creates a json object (that has keys app, model, old and new). The type of the expression is json. That's the quote from the documentation of the postgres json functions that describes the function: json_build_object ( VARIADIC "any" ) → json. But then in the second snippet the json object is converted to text so it is serialized to string and that string is passed to to_json.

This is the actual cause of the problem. to_json does not parse the string and therefore it does not create a json object. When the string is passed to to_json it produces a scalar json value. Here's the example from the documentation:

to_json('Fred said "Hi."'::text) → "Fred said \"Hi.\""

In the example from the document it is clear that the result is just a scalar string value (of type json) not a json object. But in our case the string value will contain serialized json object which can be deserialized later of cause but it is still just a string and not a json object meaning that operations like ->> cannot be aplied to it directly to get a value by a key.

The last step that happens in master now is implicit conversion in the insert operation of the result of to_json(payload::text) to the type of the payload column in pgpubsub_notification table which is JSONB. This produces the scalar string value (containing serialized json) of the type JSONB. That process does not parse the string.

Here is the example with comments that might be helpful:

select
  -- transformations that happend

    -- This is what we have in master: json object serialized to jsonb scalar string value. 
    -- Check the description and examples of to_json function 
    -- here https://www.postgresql.org/docs/15/functions-json.html#FUNCTIONS-JSON-CREATION-TABLE
    -- Also more on this below.
    to_json('{"a": 1}'::text)::jsonb AS "incorrect__json_serilized_to_string",

    -- This is what we have in this PR: json object stored as jsonb object.
    '{"a": 1}'::jsonb AS "correct__jsonb_object",

  -- types demo
    pg_typeof(to_json('{"a": 1}'::text)::jsonb) AS "incorrect_type",
    pg_typeof('{"a": 1}'::jsonb) AS "correct_jsonb_object_type",

  -- access demo
    -- we can use operators like ->> to access jsonb object by a key
    '{"a": 1}'::jsonb ->> 'a' AS "access__by_object_field",

    -- because the value is a string '->>' finds nothing and produces NULL
    (to_json('{"a": 1}'::text)::jsonb) ->> 'a' AS "access__error_accessing_non_object",

    -- because the value is a string '->>' finds nothing and produces NULL
    (to_json('{"a": 1}'::text)::jsonb) ->> 'a' AS "access__error_accessing_non_object"
    ;
  • We want to push a version where the trigger will start sending JSONB format as the payload. That will be backwards incompatible though, so in the first version we still accept payloads in the old text format. This release will require a migration to the triggers, but not to the Notification model itself.

Not exactly. You are right that changes in this PR to the trigger would produce values in payload column in pgpubsub_notification table that are incompatible with the current pgpubsub version. But the incompatibility is not in the type of the column. We want that trigger store json objects in Notification.payload field and not json serialized to string (see details in the answer to your first bullet point above). Notification.payload is already of JSONB type so no migration of this table is needed.

To keep pgpubsub compatible we allow it to accept and work correctly with payload values that store not json objects but json serialized to strings. This line | Q(payload=self.notification.payload) is responsible for that. That's the old way to check that the payload we received via LISTEN self.notification.payload (which is a json serialized to string) is the same literally in the DB column.

To sum up in this release we need to migrate triggers but no model migration is needed and will not be needed in the followup.

  • In a follow up, we'll start to only accept the JSONB payload and a apply migration so the Notification table's payload field only takes JSONB.

In the followup release we will just need to remove the line that provide backward compatibility that is | Q(payload=self.notification.payload). No DB migration will be needed.

PaulGilmartin commented 9 months ago

@romank0 Thanks for your patience and explanation. Dealing with the lower level postgres stuff is not my strongest point, and I haven't had a chance to get my hands dirty with the internals of this library for a while. I will merge this PR now. These are the release notes I will use:

_This update addresses a discrepancy in the handling of JSON payloads in the pgnotify feature. Previously, although JSON objects were sent, they were stored as serialized strings in the database. This resulted in limitations when querying the stored data. With this update, JSON payloads are now properly parsed and stored as JSONB objects in the database, allowing for more efficient querying and manipulation of the data. Note that this update requires a migration to the triggers in order accomplish this. To maintain compatibility with pgpubsub, the library is adjusted to correctly handle accepting notifications with payloads serialized in both the old and new ways. In a follow up release, we will remove this backwards compatibility.