'0s' doesn't seem to disable time out for clevis and tang setups

ShapeShifter499 commented 2 years ago

I had a configuration issue with clevis and tang which I fixed. But I noticed although I had set mount_timeout: 0s in my /etc/booster.yaml it would fail out to a password prompt instead of just trying again. What I'm worried about is I'll reboot my cluster of small board computers remotely and the one with the Tang server will take too long to be present and thus the Clevis clients will sit forever never fully rebooting till I can get home to fix it. And sure I can make sure that the Tang server is up before anything else is rebooted but I might forget and make a mistake, leaving me in a pickle till later.

I'm not sure what the best solution is for this other than just making sure the Tang server reboots and is up before touching the other Clevis clients. Is the behavior of mount_timeout: 0s normal here?

anatol commented 2 years ago

mount_timeout: 0s means try to unlock root forever and do not fall into the emergency shell.

Wrt your question. Imagine we have several slots with different types on unlock mechanism, think "regular password", "systemd-recover" (almost like regular password) and automatic unlockers like "tang" "yubikey".

What suppose to happen if you have a failing Tang/Yubikey unlockers. Should booster try to reach Tang infinitely? Or it should stop doing it and allow "manual" way of unlocking machine? Where should be a border of "stop automatic retries" and "switch to manual unlocking". Or maybe it should try to do manual and automatic unlock it concurrently?

These are the questions that are not fully clear for me and I would love to get user's perspective on it.

ShapeShifter499 commented 2 years ago

Oh ok I probably misunderstood what I thought would be a solution to my issue.

My setup is just a bodged setup at home. A cluster of small board computers with one setup with mkinitcpio-systemd-tool so I can unlock it remotely, then it runs a Tang server.

The rest setup as Clevis clients so I didn't have to manually log into each to decrypt them.

There is only an issue if I mess with the Tang server and it doesn't come online in time to be present for the Clevis clients. I can just remind myself to make sure thr Tang server is up before the rest.

ShapeShifter499 commented 2 years ago

With the risk of replicating functionality from mkinitcpio-systemd-tool would it be possible to have a fallback to a ssh setup to allow for remote unlock in case Clevis and Tang fails?

anatol commented 2 years ago

What you really want is remote secure unlocking using booster.

mkinitcpio uses ssh extension for it. This is one option. Though ssh stack is big and requires pulling a lot of dependencies.

Instead of going ssh way it makes sense to reuse Tang protocol that is exactly "remove secure unlock mechanism". #24 is the ticket that tracks this feature. Closing current ticket in favor of #24.

ShapeShifter499 commented 2 years ago

@anatol I've actually moved onto using Dracut now for all of my Arch Linux systems that will be Clevis clients.

After some testing, the Dracut setup will constantly ping for my Tang sever even if it's down. Then shortly after my Tang server is back online starts decrypting.

This works for me because the system that has Tang setup on it has 'mkinitcpio-systemd-tool' for being able to remotely ssh in and unlock the encryption it has. So in the event of a power outage or a reboot the other Clevis clients will just ping and wait till I can ssh into the Tang server to decrypt and boot.

Maybe Booster should have an option that one could set that would replicate what Dracut does when Tang isn't immediately available?

I may switch back if this was implemented because Dracut is a bit clunky for network setup. I had to add to it's config to enable systemd-networkd with a proper network config file as I couldn't figure out how to get the built in network solution to work and make a connection.

anatol commented 2 years ago

Yes, constant re-pinging (until mount timeout runs off) could an option.

Though the reverse-tang mentioned above is a more straightforward and cleaner solution. So instead a client repinging the Tang service it instead waits when the tang side contact the client directly.

ShapeShifter499 commented 2 years ago

I'm still new to 'Clevis and Tang' so I may not be aware of all of the setup I can do. So I would have Tang setup to ping Clevis when it's up?

I'm not entirely sure how I'd set that up on the Tang side. Booster would need support that though?

anatol / booster

'0s' doesn't seem to disable time out for clevis and tang setups #114