FlowFuse / flowfuse

Build bespoke, flexible, and resilient manufacturing low-code applications with FlowFuse and Node-RED
https://flowfuse.com
Other
269 stars 63 forks source link

API for Deployment History of an Instance #4424

Closed joepavitt closed 3 days ago

joepavitt commented 4 weeks ago

Request to have a single API endpoint, e.g. api/v1/projects/{instanceId}/history that returns details on every deployment made to that instance in the context of version history.

If I'm not mistaken, the sources for these deployments would be:

A lot of these I think could be gathered from the Audit Log (and reformatted appropriately).

It is also important to verify if each of these steps in the history have an available snapshot that we can revert to. Part of this work is to also determine if there are actually new tables or events we need to be recording in order to support this functionality.

Steve-Mcl commented 2 weeks ago

Request to have a single API endpoint, e.g. api/v1/projects/{instanceId}/history that returns details on every deployment made to that instance in the context of version history.

What level of detail is required (not just for this, but for the final product?)

For example, are expecting to be able to display individual changes to settings or flows? That would take either capturing all settings and flows (or a delta) with each "change".

Another level of detail is what version of the launcher was an instance running on. The launcher is responsible for taking the settings/env and applying them to the instance. Any fixes/additions/changes applied to a launcher can have an effect on the instance. This is not currently captured in the audit log.

Stack changes are also important part of history. currently, we do capture a stack change in the audit log as project.stack.changed but the associated data stored is what it was changed to (not what it was changed from). This is a shortcoming as we would not be able to rollback to the previous stack.

NOTE: While any of the above might not be implemented up front, understanding the end goal now will help decide direction (for ground work now).

  • Node-RED Editor - user has clicked the "Deploy" button themselves

The audit log flows.set identifies this event. It does have information as to which user operated it and the type of deploy (full, flows, nodes or reload). NOTE: If auto-snapshots are available (teams+) it is taken AFTER the flows.set and there is no direct tie up other than the time of the event being in close proximity.

  • DevOps Pipeline - where we would want detail on where the flows came from

The audit log project.snapshot.imported is created after a snapshot is deployed to instance. NOTE: that is the database has been updated but the instance may not be running. Data recorded includes the sourceProject and snapshot

  • Rolback Snapshots

Similar to above, the audit log project.snapshot.rolled-back is created after the snapshot is deployed to instance and the instance may not even be running. Details of the snapshot applied is logged.

It is also important to verify if each of these steps in the history have an available snapshot that we can revert to. Part of this work is to also determine if there are actually new tables or events we need to be recording in order to support this functionality.

In the case of flows being deployed from the editor, there is no knowledge of what the instance is actually running other than the flows and settings currently in the database (I get that sounds odd, but the single source of truth is really the instance itself and we do not have a "flows version" or "settings checksum" generated from the instance and posted to the platform). Note also, the auto-snapshot is taken after the deploy. This means that while we would be able to provide a rollback it would ultimatly be "after the fact" (original state is not recorded) so, to reiterate, there is currently no simple means of knowing what exactly was running before the deploy. Another factor to consider is that auto snapshots are round-robin replaced. So it is unlikely that a semi-active instance will have a snapshot to rollback to for the majority of its history.

In the case of DevOps, again, we know what snapshot was applied but not what the instance was running before the deploy.

And lastly, in the case of a rollback, the snapshot applied is known but not what the instance was running before the rollback.

Additional considerations / short comings

  1. Updating ENV VARS or other instance settings will affect an instance, but only after it is restarted. There will be no auto snapshot. For reference, project.settings.updated is logged in project audit log and it does include values changed.
  2. Auto snapshots (taken after a deploy) are round-robin replaced meaning it is unlikely any semi-active Node-RED instance will have a snapshot to rollback to for the majority of its history.
  3. changes made (like flows.set, snapshot rolled back, pipeline operated) are all after the event. There is no simple means of knowing what the instance is actually running (other than the flows and settings currently in the database)

Initial conclusions drawn from above

The current data available in the audit log is close but could be better to help paint a clearer picture (recommendations below) In each case, we know what was done but not what the instance was (exactly) running before the event (since we dont version or checksum the flows/settings). Therefore, the approach of this implementation will be to provide a timeline of what we do have (and enhance it where useful)

Recommendations for the audit trail to generate a usable timeline

  1. Settings changes should be included in the history output
    1. This way, uses can see when changes were made (even if there is no snapshot to rollback to in the immediate history)
  2. Launcher version changes should be included in the history output
    1. This would allow us to know if the launcher was updated and what version was before and after the event
    2. This would be informative and allow the user to understand if the instance was affected by a launcher update
  3. Stack changes should be included in the history output
    1. Since the stack is a critical part of the instance, it should be included in the history output
  4. Snapshots (all/manual/auto) should be included in the history output
    1. This would give user the opportunity to rollback to a well defined state
    2. The existence of the snapshot should also be indicated in the output

We may also wish to add instance restarts to the history output. This would provide context like "when a settings change was applied" (since settings changes are only applied after a restart).

Based on the above, the following API endpoint is proposed

GET /api/v1/projects/{instanceId}/history

This endpoint would return a list of events that have occurred on the instance. The events would be ordered by date/time with the most recent first. The events would include the following information:

Pagination should be supported from the outset to minimise the amount of data returned in a single request.

Sample

{
   "instanceId": "xxxxx",
   // other instance details could be included here (TBD)
   "meta": {
      // **pagination details**
      // **event lookup**
      //    an event lookup (Enum) could be included here to translate the event type (e.g `flows.set`) to a human
      //    readable string (e.g. "Flows Deployed") (or we could just translate it in the UI or send the translated text in the response) (TBD)
   },
   "history": [
      { "date": "2021-01-01 12:59:01", "event": "project.snapshot.created", "user": { "id": "abcxyz", "name": "system"}, "details": { "snapshot": 99, "name": "Auto Snapshot - 2021-01-01 12:59:01" } },
      { "date": "2021-01-01 12:59:00", "event": "flows.set", "user": { "id": "abcxyz", "name": "user1"}, "details": { "type": "full", "description": "Flow change by user" } },
      { "date": "2021-01-01 12:58:05", "event": "flows.set", "user": { "id": "abcxyz", "name": "user1"}, "details": { "type": "reload", "description": "User restarted flows to pick up settings" } },
      { "date": "2021-01-01 12:58:00", "event": "project.settings.updated", "user": { "id": "abcxyz", "name": "user1"}, "details": { "settingsVersion": "1.0" } },
      { "date": "2021-01-01 12:57:00", "event": "project.snapshot.created", "user": { "id": "abcxyz", "name": "user1"}, "details": { "snapshot": 98, "name": "Before updating settings and flows" } },
      { "date": "2021-01-01 12:56:00", "event": "project.stack.changed", "user": { "id": "abcxyz", "name": "user1"}, "details": { "stack": { "from": "v4", "to": "v4 large" } } },
      { "date": "2021-01-01 12:55:00", "event": "project.snapshot.rolled-back", "user": { "id": "abcxyz", "name": "user1"}, "details": { "snapshot": 1 } },
      { "date": "2021-01-01 12:54:00", "event": "project.snapshot.imported", "user": { "id": "abcxyz", "name": "user1"}, "details": { "snapshot": 77, "name": "staging tested ok v4" } },
      { "date": "2021-01-01 12:53:00", "event": "launcher.updated", "user": { "id": "abcxyz", "name": "system"}, "details": { "launcherVersion": { "from": "2.8.0", "to": "2.9.0" } } }
   ]
}

This could be used to form an ordered history view in the editor that would allow users to see a timeline of changes to the instance and allow them to rollback to a previous state if needed (and if a snapshot is available).

example list of events (for demonstrating data sample above, not for actual/final UI representation)

time event user details
2021-01-01 12:59:01 project.snapshot.created system snapshot: 99, Auto Snapshot - 2021-01-01 12:59:01
2021-01-01 12:59:00 flows.set user1 full (flow change by user)
2021-01-01 12:58:05 flows.set user1 reload (user restarted flows to pick up settings)
2021-01-01 12:58:00 project.settings.updated user1 settingsVersion: 1.0
2021-01-01 12:57:00 project.snapshot.created user1 snapshot: 98, Before updating settings and flows
2021-01-01 12:56:00 project.stack.changed user1 stack: v4 large
2021-01-01 12:55:00 project.snapshot.rolled-back user1 snapshot: 1
2021-01-01 12:54:00 project.snapshot.imported user1 snapshot: 77, staging tested ok v4
2021-01-01 12:53:00 launcher.updated system launcherVersion: 2.8 > 2.9

Tasks

joepavitt commented 1 week ago

For example, are expecting to be able to display individual changes to settings or flows? That would take either capturing all settings and flows (or a delta) with each "change".

My expectation would be a flow.json at each stage, then we can just calc the diff as required, rather than storing the diff at each stage which gets messy given their varying sources.

Another level of detail is what version of the launcher was an instance running on.

TMI for the customer I think. They care about their flow deployments, not underlying hardware/installation stack. This is just a history of the work the user has explicitly pushed to the flows, i.e. the flow.json.

The core question a user has at this point is "What flows were running, and when"

In the case of DevOps, again, we know what snapshot was applied but not what the instance was running before the deploy.

This is fine, once we've built the timeline, we now the flows running prior was just the entry most recently administered. Each entry doesn't need knowledge/context of the previous/next entry

Steve-Mcl commented 1 week ago

@joepavitt the grandfather issue states:

Which customers would this be available to

Other - Open for discussion

I am working on the assumption this is a feature for everyone, but I will need to know sooner rather than later if it is for licensed only and also whether this will require a feature flag (for team type specific enable/disable)

joepavitt commented 1 week ago

I'd say it's Team/Enterprise. It's around audit-ability and compliance, but also would be a fundamental feature for future Version Control, which is then relevant for Team. Let's go "Team".

knolleary commented 1 week ago

Having started to review the changes, I have a bit of a basic question to ask about this API; what is different between this API and the existing projects/:id/audit-log end point we already have?

As far as I can see, they both draw from the same source of information; they are both returning lists of audit-log entries. The history api appears to be augmenting them with some additional info about snapshots.

I guess I'm asking - why do we need a separate API when we already have one whose intent was to show a history of activity on the instance?

Steve-Mcl commented 6 days ago

FYI. Following up on this. Nick and I had a meeting to resolve the situation and the PR has been updated to address concerns.

Steve-Mcl commented 20 hours ago

@joepavitt API output examples here: https://github.com/FlowFuse/flowfuse/pull/4509#issuecomment-2358483015