flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
765 stars 32 forks source link

[Tracking] TPM 2.0/Secure Boot workstream #630

Open jepio opened 2 years ago

jepio commented 2 years ago

Rough first draft, currently not ordered:

jepio commented 2 years ago

Mantle now publishes GCP images with UEFI features including vTPM enabled. With Azure CommunityGalleries we can also enable vTPM support on our images. AWS has GA'd their vTPM, but requires switching the AMI to UEFI boot mode (dropping compat with Xen HVM instances). We will need to consider publishing separate ami's for UEFI and BIOS boot.

Secure boot has not been tackled yet, but started thinking about signing procedures and key handling.

tormath1 commented 2 years ago

As a side note, Clevis (and friends) are now available on ::guru: https://github.com/gentoo/guru/tree/master/app-crypt/clevis. Including Clevis should then follow the same process as including a package from ::gentoo overlay (https://www.flatcar.org/docs/latest/reference/developer-guides/sdk-modifying-flatcar/#add-or-update-a-package).

Butane clevis restriction should also be dropped once done: https://github.com/coreos/butane/blob/87a7a686aff511dd13a6b873baf9a7d005170737/config/flatcar/v1_0/translate.go#L37-L41

krishjainx commented 1 year ago

@tormath1 Trying to add Clevis takes me into dependency hell. I'm essentially doing https://www.flatcar.org/docs/latest/reference/developer-guides/sdk-modifying-flatcar/#add-or-update-a-package but with gentoo/guru instead of gentoo/gentoo

tormath1 commented 1 year ago

@krishjainx yes, there are some dependencies like jose and luksmeta that needs to be pulled too. Back in the days @jepio started a PoC for Clevis: https://github.com/flatcar/coreos-overlay/commits/jepio/wip-clevis you can base your work on this (or rebase his branch)

krishjainx commented 1 year ago

The preliminary support for clevis can be found at https://github.com/flatcar/scripts/pull/909. I will return to working on this shortly. I encountered some issues after working on it with an alpha tag and then transitioning it to another branch based on master. So, I will need to rebase and test it with the new SDK container locally. If someone is interested in taking over my work, please feel free to have a look. @tormath1

jepio commented 7 months ago

On EFI we are now missing tpm eventlog access from linux because that depends on the linuxefi boot protocol which is a downstream RH grub patch. I think we'll need to switch to using https://github.com/rhboot/grub2

jepio commented 3 months ago

Gael from matrix shared this link to grub patches for booting an UKI on BIOS systems, which could be interesting: https://github.com/osteffenrh/grub2-blscfg/commits/blscfg-unified-kernel-f38

chewi commented 2 months ago

I've looked into the nature of Red Hat's fork. It's based around Fedora releases with 40 still using 2.06 and 41 moving to 2.12. The patch set is very heavy with 368 commits currently applied on top of 2.12. For some reason, this doesn't quite align with the order or number of patches in their RPM package. There's a lot of noise here, as it includes commits that do things like rename grub to grub2. However, if we need the Linux EFI support, trying to cherry-pick individual commits doesn't seem like a good idea as it's a big change, and it's hard to tell exactly which commits are applicable.

chewi commented 2 months ago

I now realise why I confused about Fedora 41 during our recent meeting. The RPM package has patches against 2.12, but the branch in the rhboot/grub2 repo is still based on 2.06. :dizzy_face:

We did talk about taking a tarball from GitHub, but I'd also like to use Gentoo's ebuilds if we can because I think little to no changes are needed. That would mean applying Red Hat's changes using a patch, but even as a single file, that patch would be nearly 2MB. :sweat_smile: That's not necessarily an issue, but…

jepio commented 2 months ago

Wouldn't a single patch be a nightmare to maintain and rebase? The other thing that is important: we need to stay in control of the sbat.csv as we are responsible for the security fixes to grub.

chewi commented 2 months ago

I did ask about the weird repo situation in Fedora's Matrix, but they say the repos are for different purposes. Things are still in motion though, so maybe this will change soon. Until then, a single patch is the best option. It's actually not that hard, I'll write up the steps.

I've got sbat.csv in hand.

I have now successfully built Red Hat's fork using Gentoo's ebuild. The Gentoo patches conflict, but none of them are applicable to Flatcar, so I've simply unset PATCHES. I've also rebased our own two patches, which took a fair amount of conflict wrangling. The GPT changes are currently failing to build. More soon.

chewi commented 2 months ago

qemu_update and arm64's qemu_uefi passed, but the rest didn't. I tried the image locally, and it does seem like the verity hash isn't getting passed through, at least on amd64. No idea why yet, that patch is quite straightforward.

chewi commented 2 months ago

Still having trouble with this, despite trying a few things. I can rebuild GRUB and write the new grubx64.efi to an existing image, so the turnaround time isn't too bad, but it's still tricky. What's strange is that BIOS is also failing in the same way even though both amd64 and arm64 EFI now use grub-core/loader/efi/linux.c instead. Seemingly it worked on arm64? The result is that systemd waits forever for the usr device, but I can see from using rd.break=pre-mount that the verity argument is missing.

jepio commented 2 months ago

Still having trouble with this, despite trying a few things. I can rebuild GRUB and write the new grubx64.efi to an existing image, so the turnaround time isn't too bad, but it's still tricky. What's strange is that BIOS is also failing in the same way even though both amd64 and arm64 EFI now use grub-core/loader/efi/linux.c instead. Seemingly it worked on arm64? The result is that systemd waits forever for the usr device, but I can see from using rd.break=pre-mount that the verity argument is missing

Can you share your latest code/branch? amd64 and arm64 use different offsets where the verity hash is stored in the vmlinuz file (64 vs 512). How about we get it working with 2.06 + patches first (since that is closer to what we currently have), and switch to 2.12 in a second step.

jepio commented 2 months ago

Updated the tracker with currently open PRs that are ready to review & merge:

Ready for a early review:

chewi commented 2 months ago

See flatcar/scripts#2301. The offsets are still respected. I might try 2.06, but I'll try adding some debugging to 2.12 first.

chewi commented 2 months ago

I learnt how to debug GRUB with gdb today. I've fixed this for BIOS, the code needed some adjustment and movement from grub-core/loader/i386/linux.c to grub-core/loader/i386/pc/linux.c. The EFI issue seems to be different.

jepio commented 2 months ago

Now that you mention it: the original verity hash commit did touch grub-core/loader/i386/efi/linux.c but we dropped this when we dropped the patch that introduced linuxefi. This is the first commit (without subsequent fixups): https://github.com/flatcar-hub/grub/commit/03b547c21ec3475980a54b71e909034ed5ed5254. So this will need to be brought back.

grub-core/loader/i386/pc/linux.c seems to only be relevant to the linux16 command?

chewi commented 2 months ago

Yeah, I was about to say I think this code now needs to go into grub-core/loader/i386/efi/linux.c, which I only noticed a short time ago. My first attempt at this failed, will have another go using your link tomorrow.

chewi commented 2 months ago

I think I've got it now. Time for another Jenkins run.

chewi commented 2 months ago

grub-core/loader/i386/pc/linux.c seems to only be relevant to the linux16 command?

I think linux effectively calls linux16 when the EFI module cannot be loaded.

  cmd_linux =
    grub_register_command ("linux", grub_cmd_linux,
                           0, N_("Load Linux."));
  cmd_linux16 =
    grub_register_command ("linux16", grub_cmd_linux,
                           0, N_("Load Linux."));
chewi commented 2 months ago

I can't see any mention of the TPM event log in Red Hat's patches, only in the vanilla 2.12 sources. Are you sure the patches were needed? :sweat_smile:

jepio commented 2 months ago

grub-core/loader/i386/pc/linux.c seems to only be relevant to the linux16 command?

I think linux effectively calls linux16 when the EFI module cannot be loaded.

  cmd_linux =
    grub_register_command ("linux", grub_cmd_linux,
                           0, N_("Load Linux."));
  cmd_linux16 =
    grub_register_command ("linux16", grub_cmd_linux,
                           0, N_("Load Linux."));

I think I see what's going on. Mainline grub 2.12 has this in grub-core/Makefile.core.def:

module = {
  name = linux;
  x86 = loader/i386/linux.c;
  (...)
  x86_64_efi = loader/efi/linux.c;

and the fedora-41 branch of rhboot/grub has:

module = {
  name = linux;
  i386_pc = loader/i386/pc/linux.c;
  x86_64_efi = loader/i386/efi/linux.c;
  i386_xen_pvh = loader/i386/linux.c;

That would explain why the patch needs to be adapted for rhboot/grub: vanilla grub compiles in loader/i386/linux.c on both x86 BIOS and EFI, while rhboot grub completely separates them. It would also seem that on rhboot/grub the commands linuxefi and linux are the exact same, so we don't need to change grub.cfg to use a different command.

I can't see any mention of the TPM event log in Red Hat's patches, only in the vanilla 2.12 sources. Are you sure the patches were needed? 😅

It's not so much the patches, as it is the way grub enters the kernel (boot protocol), and this differs between the two branches. I haven't investigated this in detail, but there may be a way to get vanilla grub to enter the x86 kernel on EFI such that the eventlog is exposed. Arm64 efi is simpler and this works in vanilla grub. Compare the lines from arm64 1 and amd64 2:

arm64:

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x413fd0c1]
[    0.000000] Linux version 6.6.50-flatcar (build@pony-truck.infra.kinvolk.io) (aarch64-cros-linux-gnu-gcc (Gentoo Hardened 13.3.1_p20240614 p17) 13.3.1 20240614, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP PREEMPT Wed Sep 11 12:18:14 -00 2024
[    0.000000] KASLR enabled
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: SMBIOS 3.0=0xdced0000 TPMFinalLog=0xd9760000 MEMATTR=0xdba62198 ACPI 2.0=0xd9700018 TPMEventLog=0xd9673018 RNG=0xd970e698 MEMRESERVE=0xd9b43e18 

amd64:

[    0.000000] Linux version 6.6.50-flatcar (build@pony-truck.infra.kinvolk.io) (x86_64-cros-linux-gnu-gcc (Gentoo Hardened 13.3.1_p20240614 p17) 13.3.1 20240614, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP PREEMPT_DYNAMIC Wed Sep 11 12:15:49 -00 2024
[    0.000000] Command line: BOOT_IMAGE=/flatcar/vmlinuz-a mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=ttyS0,115200 flatcar.first_boot=detected verity.usrhash=45a627c8a86cad518157501cec1c7096df60ba0f4835732718f05affdeecf110
[    0.000000] BIOS-provided physical RAM map:
(...)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] APIC: Static calls initialized
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: TPMFinalLog=0x9cbf7000 SMBIOS=0x9c9ab000 ACPI=0x9cb7e000 ACPI 2.0=0x9cb7e014 MEMATTR=0x9b678298 
jepio commented 2 months ago

I think I see what's going on. Mainline grub 2.12 has this in grub-core/Makefile.core.def:

Ha, fun stuff: grub-2.06 doesn't have grub-core/loader/efi/linux.c, while vanilla 2.12 does. So it may be that the redhat patch is unnecessary (for what we want to achieve) with vanilla grub-2.12. Would you try it? If you add set debug="linux" to your OEM grub.cfg file then grub prints debug messages for the loader code paths.

chewi commented 2 months ago

Can I tell more directly by looking at the kernel output? Does efi: TPMEventLog= mean that it worked? If so, it's working on my laptop with no effort at all. I guess systemd-boot already supports this!

chewi commented 2 months ago

I've compared my own local amd64 QEMU kernel log between our last release and my recent changes. TPMEventLog appears in the latter but not the former. I'll try without the patches.

jepio commented 2 months ago

Can I tell more directly by looking at the kernel output? Does efi: TPMEventLog= mean that it worked? If so, it's working on my laptop with no effort at all. I guess systemd-boot already supports this!

Yes that's the line we're after. The full test case we need to add to kola is: tpm2_eventlog /sys/kernel/security/tpm0/binary_bios_measurements successfully reads and parses the event log. In phase 2 we'll need to make sure that the PCRs match what we expect and can precompute the ones that the distro is responsible for.

chewi commented 2 months ago

I can confirm that efi: TPMEventLog= appears with vanilla(ish) 2.12 and also that sudo tpm2_eventlog /sys/kernel/security/tpm0/binary_bios_measurements works! I guess we'll proceed on that track if you're sure we don't need the Red Hat patches for anything else.

jepio commented 2 months ago

I can't think of anything else we need right now. There may be differences in how grub measures things into PCRs, but we can address that later.

chewi commented 2 months ago

New Kola test is at flatcar/mantle#558.