flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
682 stars 30 forks source link

[RFE] investigate remote attestation using TPM (consider Keylime) #592

Open ahrkrak opened 2 years ago

ahrkrak commented 2 years ago

Current situation

Most servers ship with a TPM module that can be leveraged to verify (remotely attest) system integrity. Flatcar doesn't currently do so.

CoreOS used to have a proprietary version of CoreOS Container Linux that did this, but (a) it was never open sourced, and (b) community projects are now available that implement this functionality.

Impact

Potential attack vectors exist that could be mitigated.

Ideal future situation

We leverage the TPM module to ensure system integrity.

Implementation options

Consider Keylime (https://keylime.dev/), a CNCF project for this.

Additional information

See the MicroOS blog on this topic: https://kubic.opensuse.org/blog/2021-11-08-MicroOS-Keylime-TPM/

Note Microsoft recently introduced the Pluton TPM. Initially for Windows PCs but we should keep an eye on whether it makes its way to servers. https://blogs.windows.com/windowsexperience/2022/01/04/ces-2022-chip-to-cloud-security-pluton-powered-windows-11-pcs-are-coming/

I'd mentioned this to Vincent previously, but wanted to raise an RFE to get it on the board before the next round of longer-term planning.

JAORMX commented 2 years ago

Who's looking into this? I'd like to sync about this feature as am interested in trying it out and contributing if possible.

jepio commented 2 years ago

I'm looking at it starting from the TPM side, I'm happy to share some early images when they are available and collaborate on this. @surajssd was looking at keylime from the application side.

Let's jump on a call if possible, would love to find out more about this and the use cases.

pothos commented 5 months ago

The new Alpha will support clevis with a tang server but not keylime (I don't know the differences of tang and keylime yet to say what would be needed to set up keylime).

pothos commented 5 months ago

We now have docs for TPM-backed disk encryption https://github.com/flatcar/flatcar-website/pull/317

What we can do with GRUB is not as nice as what is possible with systemd-boot, thus some limitations exist. But in general one can say that when the encryption is bound to the OS state this serves as remote attestation mechanism, when, for example, the SSH host key stays the same or any other secrets in the VM are shown to be possessed.

We can leave this issue open for a Keylime PoC and docs section for that.

aw042 commented 1 week ago

The new Alpha will support clevis with a tang server but not keylime (I don't know the differences of tang and keylime yet to say what would be needed to set up keylime).

It took me a while to at least think I understand this, but based on the documentation for everything I believe Keylime is primarily for attestation whereas Tang can complement TPM using Shamir's Secret Sharing (SSS) in Clevis for binding LUKS encryption.

I think Tang and Keylime can work well together. Use SSS Clevis where the PCR7 TPM check needs to pass and the node has to connect to Tang (I understand Tang server simply as "if you can connect to me then you may pass") in order to be decrypted. The node boots and is decrypted, and Keylime agent (running as a container) then attests the integrity of the TPM (could measure whether any of the PCRs [beyond PCR7] have been tampered with). If attestation fails, then Keylime can quarantine the node by kicking it off the network/blocking any further connection to the Tang server. This isn't bulletproof security seeing as the node would already be booted and unencrypted if Keylime attestation fails (say PCR7 passes, but other PCRs are tampered with), but I would say it's pretty good security because if it restarts then it can't unencrypt again given Keylime has severed the connection to the Tang server.

As an aside, at first, I was leaning towards Fedora CoreOS because the full disk encryption documentation seemed simpler, but Flatcar has everything I desire and more - the documentation is just more verbose covering systemd-cryptenroll PCR8+9 pinning/unpinning in addition to the standard Clevis PCR7 pinning. That confused me initially, but it's great that the documentation covers it because I've recently been into hardware security modules and really want to play around with PKCS#11 (which systemd-cryptenroll supports) because TPM2.0 implements FIPS 140-2 Level 2 tamper protection/seemingly FIPS 140-3 Level 1 (https://trustedcomputinggroup.org/wp-content/uploads/TCG-FIPS-140-3-Guidance-for-TPM-2.0-Version-1.0-Revision-1_14Feb24.pdf) whereas there's more robust HSMs available to purchase today that are FIPS 140-2 Level 4 or FIPS 140-3 Level 3 (haven't seen a FIPS 140-3 Level 4 unit available yet).

In summary, my opinion is that Flatcar is good right now. It supports Clevis FDE, and Keylime can run as a container for attestation and network quarantining. Furthermore, I appreciate the documentation for more advanced systemd-cryptenroll protection. I'm really excited to hear any updates regarding "systemd-boot for signed TPM policies" and "systemd-pcrlock for additional control".