eulerto / wal2json

JSON output plugin for changeset extraction
BSD 3-Clause "New" or "Revised" License
1.33k stars 161 forks source link

Include user name and application name in output? #8

Closed stevenwinfield closed 7 years ago

stevenwinfield commented 7 years ago

Hi,

I wondered if it would be possible to (optionally) include the SESSION_USER and application_name values for the backend that committed a transaction in the json output?

Thanks.

eulerto commented 7 years ago

@stevenwinfield neither role nor application name are provided by logical replication. Postgres needs to be patched to include such information in the transaction log. What is the use case for having such information?

stevenwinfield commented 7 years ago

Audit logging - recording into a second database who inserted/updated/deleted tuples in any table, by what method (i.e. application_name) and when. We currently use triggers on every table to record this information, but that is quite a heavyweight approach. In contrast, we could use logical replication + wal2json, feed the json output into a python script using psycopg2 (which now supports logical replication connections) and update an audit database from there - this would have much lower overhead on the primary database.

eulerto commented 7 years ago

Why don't you use pgaudit [1] or a similar tool? pgaudit uses hooks not triggers. I'm not sure postgres will add user/application name to transaction log for such a narrow use case (read one explanation at [2]). Logical replication, doesn't log SQL commands; it logs data. Commands are assembled by logical decoding plugin (such as wal2json) before streaming data. It means that commands that don't modify data (such as SET and SELECT) are not replicated. I don't know your audit level but it surely won't cover some common cases.

[1] http://pgaudit.org/ [2] https://www.postgresql.org/message-id/CAMsr+YFhe8yBsNCNEAFnY1OvEU6EXjmrGNFxexK1N=pqHLdjwA@mail.gmail.com

stevenwinfield commented 7 years ago

Thanks for the links.

pgaudit looks good but it only logs to files, which we'd need to parse by tailing them in a separate process (since sometimes we need the information from the audit very soon after it has been written), and having that process be able to pick up from where it left off, should it be taken down for any reason, is cumbersome.

If it were possible using logical replication then - since the master keeps track of the progress of the clients for each slot - we'd be much more robust to the client going down.

I could perhaps insert the user name and application name into the logical decoding stream with pg_logical_emit_message - either in a hook or a lightweight trigger - and then teach the plugin about those messages.

I'll close this issue. Thanks again for your help.