Open plajjan opened 11 years ago
Should the log table store an exact version of the prefix/pool/vrf entry or should we store deltas?
Logging at an audit log, we probably want to display some form of delta but at the same time I suppose it would be useful to look at the full entry at a certain point in time.
Calculating delta from full records requires that we retrieve two entries.
Calculating full records from delta changes could potentially require the retrieval of the full audit log (if a certain column hasn't changed since the beginning).
Therefore I suppose it's better to store the full entries.
@garberg, plxz to comment.
Log records are inserted by nipapd - should we do this in the SQL database instead? Moving to a column-based schema as described above would make it easier to do within postgresql. I think the missing values would be who performed the change - can we pass that into the database somehow?
We can still handle things like import of data by temporarily disabling the triggers for the import - see http://koo.fi/blog/2013/01/08/disable-postgresql-triggers-temporarily/
Here's another way of doing it - http://wiki.postgresql.org/wiki/Audit_trigger_91plus
I agree on that we should store full entries. Just as you say deltas would be more elegant, but seems a bit complex to me. A git-powered column type for differences would be nice :)
Another way of implementing a full audit log would be to simply add new records to the current tables for each change and keeping track on what's currently valid using timestamps, perhaps create a view which gives only the current valid data. I looked into something similar a while ago, using the postgresql temporal extensions to keep track of what's valid and assuring uniqueness. There were a few drawbacks though (such as the lack of support for the times +/- infinity, another non-standard module...) so I'd not recommend temporal. Also unsure about the performance impact of this kind of database model...
The audit trigger I linked to stores the complete old row using a hstore and the changes, also in a hstore.
We could adopt the same approach. In addition, as suggested by garberg, we could store a text message describing in human form what has changed, something like "prefix added" and "prefix modified".
A lot of sites suggest saving the new row and not the old row so that the current entry can be fetched only by looking in the audit table and not having to join in the "main" table. If we do the same, I guess that would mean that the changes column should contain the values from the old column. Thus the new row can be simply read out directly while the old row can be retrieved by full row + update of changes column. It feels a bit backwards.
For the record, we've discussed implementing audit logs in a number of different ways.
A hstore-type will allow us to store the entire row but that means that all information that we need is required to be in the row. The current implementation only stores information actually related to a VRF, prefix or pool in the table. Audit metadata, such as who performed a change, is only stored in the audit log table. This works well as it is currently nipapd that writes the audit entry and it has access to that information but if we are to move to a trigger based audit function, it will not have access to this audit metadata, unless it is included in the actual row. For this reason, we find that simply adding the data as extra columns would be the best way to move forward.
A trigger based audit function is vastly superior to one performed in nipapd. It will automatically be transaction safe and atomic.
There's a few choices. One could add the audit metadata as a hstore as well, if we would want to avoid adding so many columns, but that probably just becomes messy.
Since we are now using hstore for the avp-columns we can no longer use hstore for the audit log. Hstore only supports one level of keys, ie no nesting, so storing a row with a hstore column inside a hstore is just not possible. Fear not though, hstore2/jsonb is available in PostgreSQL 9.4 which does support nesting.
However, I'm not convinced that starting to rely on pg 9.4 features is the right thing to do just yet. We might want to hold off until 9.4 is available, as part of the standard package set, on more distributions before moving forward.
Ubuntu 14.04 which is the current LTS release (and I suppose it will be for quite some time) comes with postgres 9.3 so I'm very hesitant to introduce a dependency on pg 9.4
Debian on the other hand, released a new stable not long ago which ships pg 9.4
So after a discussion with @garberg we concluded that we should stick with whatever we could implement on 9.3, which could either be multiple hstores, like one for the majority of columns on the row but with additional ones for avps and tags.
Another solution is to use json (not jsonb/hstore2) which is available in pg 9.3. However, json doesn't support things like subtracting, so NEW - OLD doesn't result in a structure with the differences as it would with hstores. Would have to work around this, perhaps by using hstores for the calculation and combining the result for the final struct.
We (in my company) are also interested in having audit log exported in syslog mainly for accountability purposes. Although the undo functionality through the web interface sounds interesting, it is not our main concern. I started tampering backend.py to make it work but then saw this issue plus #727 PR. So to have a clear understanding, what will be available in 0.30 or 0.29-Zeus? Will you rework the audit in PostgreSQL AND provide syslog support? Do you have a planning release date? A big thanks for the excellent software!
The audit log is today implemented through a single table which keeps most of the data in a text entry. To facilitate undo functionality as well as other more advanced features or just more specific searches on changes, we need a better audit log format.
Suggestion is to more or less copy the table structure for each type of object that we should keep an audit log for (pool, VRF, prefix). In addition, we need columns for timestamp, user, Auth system and a comment field.
The comment field can be used to annotate things like undo operations.