fedora-iot / greenboot

Generic Health Checking Framework for systemd
GNU Lesser General Public License v2.1
101 stars 29 forks source link

Update the rollback mechanism #102

Open say-paul opened 1 year ago

say-paul commented 1 year ago

Currently greenboot rollback is dependent on ostree-finallized-stage.service which is be triggered only on first reboot, after an update is deployed in ostree. So time delayed failure can not trigger any rollback which may hamper certain use cases.Also this helps greenboot to be more closely integrated with the ostree architecture. It will also reduce dependency on systemd service orchestration.

Example: /usr/lib/greenboot/check.d/required.d/02_watchdog.sh failure will not have any rollback triggered for cases after first reboot, which can happen in an edge scenario.

say-paul commented 1 year ago

We can leverage the result of rpm-ostree status --json to get the time stamp of the deployments and ordering.

say-paul commented 1 year ago

There are confusion though as how to determine when an update is actually deployed. as the json seems to only have timestamp of when the update is staged.

jmarrero commented 1 year ago

I think the idea of the status to show the deployment time, is that that is the actual time when the commit is added to the system, however I understand that you might need the actual deployment(finalization) time. I did not find a rpm-ostree or ostree output that shows it. But I might be overlooking something obvious... however, you can take the timeline were the latest deployment was added to /ostree/deploy/fedora/deploy/ for example (replace fedora for your distro.) For example: ls -la /ostree/deploy/fedora/deploy/

[jmarrero@silverblue deploy]$ ls -la /ostree/deploy/fedora/deploy/
total 16
drwxr-xr-x. 1 root root 1112 Jul  3 11:09 .
drwxr-xr-x. 1 root root   18 Oct  8  2021 ..
drwxr-xr-x. 1 root root  158 Jul  1 20:27 1449077d3cf7a324a331e1a26665e0517d135024c332e48f07c715772fe3809e.0
-rw-r--r--. 1 root root  113 Jul  1 20:34 1449077d3cf7a324a331e1a26665e0517d135024c332e48f07c715772fe3809e.0.origin
drwxr-xr-x. 1 root root  158 May 11 17:59 62c79b40b17284f9897b00aae1f858a56990ccde51997d870765fd2b6a040fab.0
-rw-r--r--. 1 root root  148 May 11 21:45 62c79b40b17284f9897b00aae1f858a56990ccde51997d870765fd2b6a040fab.0.origin
drwxr-xr-x. 1 root root  158 Jul  2 23:01 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
-rw-r--r--. 1 root root  113 Jul  3 11:09 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0.origin
drwxr-xr-x. 1 root root  158 May 11 08:07 65c0a202abe2e80bd09814bd38c71a996fee1ace0ab14f86c0666f8c3de111a5.0
-rw-r--r--. 1 root root  148 May 11 17:15 65c0a202abe2e80bd09814bd38c71a996fee1ace0ab14f86c0666f8c3de111a5.0.origin

Then running stat on the newest deployment:

[jmarrero@silverblue deploy]$ stat 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
  File: 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
  Size: 158         Blocks: 0          IO Block: 4096   directory
Device: 0,37    Inode: 68033679    Links: 1
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:root_t:s0
Access: 2023-07-03 11:51:57.117719748 -0400
Modify: 2023-07-02 23:01:38.541359148 -0400
Change: 2023-07-03 11:09:52.043005424 -0400
 Birth: 2023-07-02 23:01:35.357372660 -0400

You can see that the Birth/Modify are the same but Change is when it was added to this directory. I think you could use that Change timestamp on the latest deployment you find in the /ostree directory.

However maybe @cgwalters knows a better way and something I am overlooking.

cgwalters commented 1 year ago

I am not fully following but if the goal is to know a timestamp for when a deployment was created, then it's closer to the birth time right? In theory the deployment directory inode could be modified for other reasons although in practice usually isn't. Note though the birth time may not available on all linux filesystems I believe but may be on the ones we care about.

greenboot perhaps could add xattrs on the deployment directory? Though doing so would require temporarily lifting the immutable bit, which is a bit racy unfortunately...

cgwalters commented 1 year ago

There is also the origin file which is arbitrary metadata associated with a deployment.

say-paul commented 1 year ago

@cgwalters The goal is actually to calculate the grace period to mark the update as successful and no rollbacck will be triggered post that even if the health check fails. The time needs to be calculated from the moment the system restarts after a commit is staged. Since there can be a gap between rpm-ostree upgrade and reboot I am looking for options to resolve this. I was looking at the system-update-done.service and ostree-finalize-staged.service but that will be just parsing through the journald which as you suggested is not a great idea.

say-paul commented 1 year ago

@jmarrero I looked into the your suggested method, I did ostree admin unlock --hotfix and saw the timestamp got updated. Though I dont see any practical implication of doing that but that echos @cgwalters statement of

deployment directory inode could be modified for other reasons

This might require some investigation of what all cases can modify the timestamp, and find ways that it wont hurt greenboot's functionality.

jmarrero commented 1 year ago

Does it need to be on first boot, can't it be when finalization finishes? If so maybe looking at the /boot/ostree entries? But if greenboot can't add more xattrs maybe we can extend the origin file or deployment metadata to add another entry? Like first-boot-time:

cgwalters commented 1 year ago

I'd be fine to add an xattr upstream in ostree for when a deployment is first booted. I think it'd be a pretty easy change because we already as of lately run a systemd unit on boot.

say-paul commented 1 year ago

@jmarrero @cgwalters POC PR:https://github.com/say-paul/greenboot/pull/1

Consolidated Challenges or information that will be useful.