fabric8-services / fabric8-wit

wit stands for Work Item Tracker
http://devdoc.almighty.io/
Apache License 2.0
45 stars 86 forks source link

Historization Story #141

Closed tsmaeder closed 7 years ago

tsmaeder commented 8 years ago

We have discussed the need for historization of our data in the past, but on chat. This issue serves as a persistent point of discussion. There are multiple drivers for wanting historised data: one is operational. Just like git history, it is interesting who changed what and when. The second is in forensics: it seems that in government contracts in the USA, investigators sometimes need to know the state of the system at a specific point in time. In existing systems, doing this is a real PITA. It would be cool to have a knob you can dial back to last summer and see the system as it was then. This second form of "time machine" might have elevated requirements of tamper resistance.

kwk commented 8 years ago

@tsmaeder Isn't it enough for a workitem to store a history of values that vary over time?

For example for state changes a straight-forward approach could be this:

type State struct {
    Status string
    ChangedAt date.Time
    ChangedBy User
}
type Issue struct {
    // ...
    StatusHistory []States
    //...
}
tsmaeder commented 8 years ago

That is one of many ways to do historization, but we would have to do the above for every field in every construct we have and handle it in queries, for example. The above structure, would need a subquery to filter for the current state in an sql db, so I guess that wouldn't fly. What I'm trying to say is that this is that designing our story here is not a straightforward task. There different strategies that need to be evaluated against the use cases we have.

kwk commented 8 years ago

The reason I came up with this is because in github you simply see the changes. right away. At some point they are condensed when too many changes happened in a very long period. To overcome the issue of filtering out the current state we could add another field and have a structure like this:

type State struct {
    Status string
    ChangedAt time.Time
    ChangedBy User
}
type Issue struct {
    // ...
    CurrentState State
    StatusHistory []States
    //...
}
func (i *Issue) ChangeState(status string) {
    i.StatusHistory = append(i.CurrentState)
    i.CurrentStatus = State{State: status, ChangedAt: time.Now(), ChangedBy: "jane doe"} 
}

I understand that we still would have to implement this for every field that needs historization and this can be quite tedious and error-prone. I just wanted to get the discussion on a possible implementation started.

Audits

@tsmaeder, I remember you talking about audits in database and according to this article I've already implemented this without knowing it was called audit,

The disadvantage of audits IMHO is best explained with an example:

It is really hard to find out the name of the group (A) in the past as you would have to do quite some heavy lifting here. Not to mentioned the setup with triggers required to only keep track of the changes over time.

Would you consider audits worth looking into?

aslakknutsen commented 8 years ago

The only reference to history for the next 6 months release, as far as I can tell at this time, is;

... a Work item should reflect the updated state, and interleave the state change history with the comments for the Work item

The reference to state in this section is a bit vague, but previously it talks about Workflow State. So technically we may need to store only the history of 'system.state'.

But...

Until more clear requirements are planned, keep it as simple as possible. Having an array of change objects within the Work item itself goes a long away.

e.g. stored as:

{
    "version": 1,
    "type": 1,
    "fields": {
        "system.state": "open",
        "system.owner": "a",
        "system.title": "Check this out 2"
    },
    "history": [
        {
            "version": 1,
            "type": 1,
            "changedBy": "a",
            "changedAt": "timestamp",
            "fields": {
                "system.title": "Check this out 2"
            }
        },
        {
            "version": 0,
            "type": 1,
            "changedBy": "a",
            "changedAt": "timestamp",
            "fields": {
                "system.state": "open",
                "system.owner": "a",
                "system.title": "Check this out"
            }
        }
    ]
}

That would allow us to just select the top level "fields" if we want to view the latest version, show the change history in the timeline, and technically replay the state of the object into any historical state either for viewing or later export to a more permanent history storage.

hectorj2f commented 7 years ago

@aslakknutsen do we still want to add this historization story ?