Open madduck opened 1 year ago
I can work on it if @eikek wants this feature. As it affects the code on many places, I suggest doing this:
Hm, I think this is quite a big topic and lots of questions come to mind. Here are a few:
So in summary, given that it possibly is a significant change to many places, and there is not much gain for the main audience, my feelings are to currently refrain from doing it.
Thanks @eikek for your views. I almost want to address your last question first, but I think it's better if I outline the design I had in mind when raising this issue. Because I think that also answers many (if not all) your other questions, and shows that ultimately, we're not changing that much, but the benefit is real, for everyone. Even families make mistakes, and having a log available can be very useful in such a situation.
My view of an audit logging system for Docspell is rooted in a table in the database. It's append-only. Every action (linked to an item, for now!) taken in the system yields a new row in this table. A tag removed is recorded, linked to an item, with a timestamp, and possible other information, such as the origin of the request, and the authenticated user, if there is one. Same for any other change of metadata, state, or anything really.
We're not looking at users, tag objects, contacts for now. Yes, contacts would be interesting too, maybe next, but items is by far the most important. Let's only look at items.
So essentially, anywhere in the code a database call is made relating to an item ID, we probably need to insert a call to an internal API that ultimately writes the database row. It might be a lot of places, but conceptually, it's all just the same: provide whatever state you have right now, and describe (in clear text even) to the logging system, what just happened. That's it… for phase 1.
Phase 2 is the hard phase, because the question of how to meaningfully make this information available to the user(s) is very much dependent on what your audience is. But that's fine, we don't need this phase, at least not while we haven't reaped the fruits of phase 1 yet.
For me, the most important thing is that we start logging earlier, rather than later, the sooner, the better. Get the information before it's lost. We can worry about what to do with it later. Meanwhile, it's not going to flood the system, or cause much of a performance hit, because we're only ever adding rows to a database…
And as to your first question about how to get started: no, we don't touch any items, or the database outside of our table, which we only just created. We only start logging when we have the information. Nobody knows or can currently meaningfully discern how a document arrived at a certain state. This information has been lost forever.
PS: My answer to question 2 is: yes, it's the goal to have everything covered. When something happens that's not being covered, then we need to insert that line of logging somewhere in the control path, when we have the most state information.
PPS: PostgreSQL could be made to log anything, but that comes with two issues:
It would be really useful, if Docspell logged every mutation on an item, i.e. change of metadata, deletion, undeletion, reprocessing, etc.. For each audit log, it should save the new value, the previous value (where applicable), who initiated a change, any metadata about the connection, such as HTTP connection information, and a timestamp, obviously.