TrenchBoot / trenchboot-issues

This repository is to centralize issues and development progress tracking for the TrenchBoot project.
4 stars 1 forks source link

Test TPM 2.0 support on Intel hardware with legacy boot mode and Update Qubes OS AEM documentation #16

Closed BeataZdunczyk closed 1 year ago

BeataZdunczyk commented 1 year ago

Is your feature request related to a problem? Please describe.

The current Qubes OS AEM documentation does not provide information on TPM 2.0 support or how to use it with legacy boot mode on Intel hardware. Additionally, it is necessary to test the solution on Intel hardware with TPM 1.2 and 2.0 using legacy boot mode to ensure proper functionality.

Is your feature request related to a new idea or technology that would benefit the project? Please describe.

This task is required to ensure that the Qubes OS AEM documentation is up-to-date and provides accurate information on how to use TPM 2.0 with legacy boot mode on Intel hardware. Proper testing is crucial to ensure the implementation works as expected on Intel hardware configuration.

Describe the solution you'd like Test the solution on Intel hardware with TPM 1.2 and 2.0 using legacy boot mode to ensure proper functionality. Update the Qubes OS AEM documentation to include information on TPM 2.0 support and how to use it with legacy boot mode on Intel hardware.

Describe alternatives you've considered

N/A

Additional context

This feature request is part of Phase 2 in TrenchBoot as Anti Evil Maid project, as outlined in the documentation: https://docs.dasharo.com/projects/trenchboot-aem-v2/.

Relevant documentation you've consulted

N/A

krystian-hebel commented 1 year ago

We should also update TrenchBoot documentation in addition to Qubes OS.

krystian-hebel commented 1 year ago

Current status:

Testing is barely possible on board we are currently using (Supermicro X11SSH). Serial output goes through BMC and gets modified for who knows what reason. Depending on redirection settings, I can get all characters squashed into single line, have each character on separate line, or get a lot of ANSI escape codes with output jumping all over the screen, where lines are printed out of order and some of them are even dropped.

I found that the only reasonable option is to change Xen output to 0x2f8 (changed the code, didn't try to do this through command line because modifying it through unreliable output earlier already took all my patience) and gather the log through SoL (ipmitool -I lanplus -H <<BMC IP>> -U ADMIN -P <<password>> sol activate | tee test.log).

Test results are very inconsistent, they change between boot attempts. Without any modifications, platform hanged before Xen produces any output. This could point to issue in the early extend code, so I added manual writes to 0x2f8 around that function. Attaching log from that boot test2.log.

In there, 86\r\n is printed just after entry to Xen, E, X and G are printed after .Lslaunch_proto, before tpm_extend_mbi and after tpm_extend_mbi respectively. As you can see, there are lots of errors later on about USB devices not responding. OS is installed on USB stick (don't ask me why...) so of course it fails to boot past its initramfs. It eventually boots to recovery shell, but it happened after I stopped recording. New sets of errors are printed after key is pressed on virtual keyboard exposed by KVM.

My first assumption was that the code I added changed the outcome, so I double-checked if any of the registers used for printing to serial (%dx and %al) were used for something else. %eax was used to convey MBI address, but it was already copied to safe place. %edx should be zeroed in case there is no MULTIBOOT2_TAG_TYPE_BASIC_MEMINFO entry so I added xor %edx,%edx after printing, even though I'm 99% sure that GRUB always produces this tag. After that platform hanged after printing 86\r\nEX, so somewhere during tpm_extend_mbi, which I assume was the case before I added any debug output.

I wasn't recording the output at that time so I started doing so and rebooted the platform, and surprisingly it booted (up to USB issues). I tried rebooting the same Xen image few more times, with mixed results. I haven't noticed any pattern, it seems to be completely random, but it hangs in early code more often than not. For Xen without added debug output it never booted past the early code (at least I haven't seen it, which on the other hand could be caused by wrong SoL redirection settings), although I've tried it only two or three times before adding debug.

There is (XEN) [VT-D] RMRR [78800000,7affffff] not in reserved memory; need "iommu_inclusive_mapping=1"? in the output, similar error is printed by Debian kernel. This shouldn't matter for early hang, but it may be connected with USB issues. I haven't tried following this suggestion yet, as this isn't really part of this task.


TL;DR: something is broken, but debugging on X11 is hard. We're waiting for another platform to arrive, hopefully it will be easier to play with.

macpijan commented 1 year ago

So in the next step we should start with simply testing latest code on a new platform, to see if the results are consistent @krystian-hebel ?

We should also update TrenchBoot documentation in addition to Qubes OS.

Do you have something specific in mind for documentation? For the QubesOS documentation, last time I checked with @SergiiDmytruk , there was nothing really we should do.

krystian-hebel commented 1 year ago

So in the next step we should start with simply testing latest code on a new platform, to see if the results are consistent?

Yes, and debugging it further if possible.

Do you have something specific in mind for documentation?

Not sure why I added it here, we probably will need to update it in next phase when we will actually implement tables according to specification. There are some minor issues here and there, like mentioning of intermediate loader in FAQ that is long gone, or use pointers in tables that are supposed to be independent of architecture in specification. We can fix all those problems at once later I think.

BeataZdunczyk commented 1 year ago

Tested almost all issues in milestone 2 on TPM 2.0. One remaining: Integrate TPM 2.0 software stack into Qubes OS Dom0 - https://github.com/TrenchBoot/trenchboot-issues/issues/12. Currently testing on TPM 1.2 platform to confirm no regressions from implemented changes.

krystian-hebel commented 1 year ago

I've manually tested the solution on TPM1.2, no regression has been noticed, although I'm not sure if we can talk about regression when full AEM (including scripts, systemd services etc.) wasn't ever tested with TrenchBoot before. @miczyg1 will be testing it for TPM2.0, after which this task can be considered as done. Screenshots from AEM booting:

00_grub 01_srk 02_secret 03_disk_password 04_sealed

Link to PR with updated documentation: https://github.com/TrenchBoot/qubes-antievilmaid/pull/6

BeataZdunczyk commented 1 year ago

To summarize, we have:

Closing this issue. Thanks to everyone involved! We are currently working on the next phase, see https://github.com/TrenchBoot/trenchboot-issues/milestone/3, where we are tracing our progress.