Rohde-Schwarz / TrustedGRUB2

DEPRECATED TPM enabled GRUB2 Bootloader
GNU General Public License v3.0
191 stars 78 forks source link

Avoid TCG_hashLogExtendEvent #55

Closed johnwallace123 closed 2 years ago

johnwallace123 commented 7 years ago

Avoid the problematic TCG_hashLogExtendEvent call by replacing it with a TPM_Extend followed by a TCG_hashLogEvent.

I have not confirmed that this resolves the various related issues (#53, #45, possibly #52), but I wanted to submit the code for review as soon as possible. I will check an end-to-end solution tomorrow and report back.

johnwallace123 commented 7 years ago

I have confirmed that this resolves our issue, with the PCRs logged and calculated as expected. Additionally, the log description is now "IPL", remaining unchanged.

Zepmann commented 7 years ago

This does not solve the issue #53/#54 on an HP EliteBook 840 G3 notebook with the latest BIOS installed (01.10).

Boot text without this patch (debug enabled, of course), conform #53/#54:

Attempting to decrypt master key...
Enter passphrase for hd0,msdos1 (<insert uuid here>)
Slot 0 opened
measured module: terminal
SHA1 of buffer: 1348d7ba74a4fd50d45cec8e2c1708223afc1093
TCG_HashLogExtendEvent failed: 0x2
Aborted. Press any key to exit.

Boot text with this patch (again, with debug enabled):

Attempting to decrypt master key...
Enter passphrase for hd0,msdos1 (<insert uuid here>)
Slot 0 opened
measured module: terminal
SHA1 of buffer: 1348d7ba74a4fd50d45cec8e2c1708223afc1093
TPM_Extend on PCR 13
New PCR[13]=e348e1df37f2508132e4ac36f781651f00000000

TCG_HashLogEvent failed: 0x2
Aborted. Press any key to exit.
johnwallace123 commented 7 years ago

If you change the offending grub_fatal() to a grub_printf(), are you able to boot successfully, and if so, is the measurement log working as expected (/sys/kernel/security/tpm/ascii_bios_measurements)?

On Wed, Oct 19, 2016 at 10:49 AM, Zepmann notifications@github.com wrote:

This does not solve the issue #53 https://github.com/Rohde-Schwarz-Cybersecurity/TrustedGRUB2/issues/53/

54 https://github.com/Rohde-Schwarz-Cybersecurity/TrustedGRUB2/pull/54

on an HP EliteBook 840 G3.

Boot text without this patch (debug enabled, of course), conform #53 https://github.com/Rohde-Schwarz-Cybersecurity/TrustedGRUB2/issues/53/

54 https://github.com/Rohde-Schwarz-Cybersecurity/TrustedGRUB2/pull/54:

Attempting to decrypt master key... Enter passphrase for hd0,msdos1 () Slot 0 opened measured module: terminal SHA1 of buffer: 1348d7ba74a4fd50d45cec8e2c1708223afc1093 TCG_HashLogExtendEvent failed: 0x2 Aborted. Press any key to exit.

Boot text with this patch (again, with debug enabled):

Attempting to decrypt master key... Enter passphrase for hd0,msdos1 () Slot 0 opened measured module: terminal SHA1 of buffer: 1348d7ba74a4fd50d45cec8e2c1708223afc1093 TPM_Extend on PCR 13 New PCR[13]=e348e1df37f2508132e4ac36f781651f00000000

TCG_HashLogEvent failed: 0x2 Aborted. Press any key to exit.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Rohde-Schwarz-Cybersecurity/TrustedGRUB2/pull/55#issuecomment-254836226, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCaUBLSCSlqQmEQ68uU_aiAtEqU0Gu4ks5q1i32gaJpZM4KZOxK .

Zepmann commented 7 years ago

I changed grub_fatal to grub_printf on line 180 in tpm_kern.c, recompiled and re-installed.

I get a TCG_HashLogEvent failed: 0x2 for each measurement. The system fully boots.

The last entries in /sys/kernel/security/tpm0/ascii_bios_measurements relate to measurements of PCRs 8 and 9 ('Compact Hash'). These are the largest PCR numbers seen in the log, and are only measured once. My guess is that these measurements are logged since they are performed by the BIOS (PCR 8, concerning the MBR containing GRUB's stage 1 before it is loaded) and GRUB's stage 1 (PCR 9, concerning GRUB's core.img before it is loaded).

During boot I saw measurements for higher PCRs. The current values for PCRs 10, 11 and 13 are available in /sys/devices/pnp0/00:0b/pcrs. Strangely enough, the PCR value for PCR 12 (the LUKS header as processed by TrustedGRUB 2, according to the documentation) is all zeroes, but this might be because I do not (yet) unseal the LUKS volume using the TPM. I will test that soon.

Edit: What is weird that PCRs seem to change twice for each measurement. For every "SHA1 of buffer" line I get two "New PCR[##]" lines. I guess this is expected behavior because of this. If this workaround is kept, it should definitely be added to the documentation to avoid confusion when debugging.

Edit 2: Unsealing a sealed binary blob containing the key file for a LUKS volume used by TrustedGRUB 2 extends PCR 12 in the same way as the other PCRs. The final measurement is visible in /sys/devices/pnp0/00:0b/pcrs, but not in /sys/kernel/security/tpm0/ascii_bios_measurements.

Edit 3: I have verified using pre-computation (actually, post-computation in this case) that the final PCR values as reported by the TPM are valid.

johnwallace123 commented 7 years ago

@Zepmann, I was afraid that your experience would be what we saw. I was genuinely surprised when it actually worked for me.

If you change the grub_fatal, the only error would be coming from a hashLogEvent, which doesn't modify any PCRs. The PCRs should still be extended as expected.

As for the dual "New PCR[##]" messages that you see, you should see the same PCR and hash value listed twice. The first time is an output of the TPM_Extend command, and the second is only in the dedug output and is a result of the TPM_ReadPCR command. Essentially, we're performing an extra read of the PCR, even though we already have that value.

Now, in my use case, we need the measurement log to precompute the PCRs (for the command string). What seems to be happening is that there's a bug in extending the measurement log for "format 2" inputs. We need to find a way to update the measurement log where we simply give the SHA1 hash added to the PCR rather than the buffer to be hashed.

I'll dig into the TPM main spec. as well as the TCG BIOS spec to see if there's a workable alternative here.

EDIT1: I just realized that the test machine that I used to confirm the fix was unaffected by the bug. I was premature in declaring success.

Zepmann commented 7 years ago

@johnwallace123 Thanks for taking the time for this.

As for the dual "New PCR[##]" messages that you see, you should see the same PCR and hash value listed twice.

I only see one hash value. The two shown PCR values for each measurement differ. It is the second shown value that is the final value of the measurement. This value is also the correct/as expected value, since I am able to precompute these values. I have no idea what the first reported PCR value stands for, but it is not part of the values that are registered by the TPM. If it would, my precomputation would fail.

A more generic question: Do you know why measurements are not made in the same way GRUB stage 1 (the MBR) measures core.img? That measurement for PCR 9 seems to be logged without any issue.

Now, in my use case, we need the measurement log to precompute the PCRs (for the command string).

You are not precomputing if you rely on measurements made during a previous boot. Precomputing is measuring what will be measured during the next boot before you actually reboot, without relying on measurements from the previous boot (since those might be unavailable or incorrect due to changes in the to be measured data). I can easily do this for PCRs 8, 9, 10 and 12. PCR 11 is difficult if you do not have a linear GRUB configuration, but not impossible if you simplify the configuration somewhat. PCR 13 is also tough if you do not know the module load order, but this order should be pretty much static if you do not change the load order in the configuration.

Not that I want to discourage you to solve this issue. Please continue to work on it. Let me know if you need more information or some testing.

johnwallace123 commented 7 years ago

I'll try to take a look today with debug enabled to see what you see wrt the PCR messages.

As for why we don't use the same method as grub stage 1, the call used there (TCG_compactLogHashEvent) takes a buffer and hashes it in the tpm. This is what PR #54 implemented. However, at the time of the function call, we only have the hash to extend the PCR with, not the buffer itself. So, what happens in this case is we were actually getting SHA1 (SHA1 (buf)) instead of SHA1 (buf).

As for needing the measurement log, I don't actually use anything there to precompute. Instead, I use the hashes from the previous boot to help guide me in what I expect the path of the next boot to be. I use this in precomputing PCR 11 with a stock grub.cfg. It's only a heuristic, and if the grub.cfg changes radically, it will probably fail horribly. For the source, see my tpm-luks github repo (https://github.com/GeisingerBTI/tpm-luks)

johnwallace123 commented 7 years ago

@Zepmann: I've taken your advice and revised the PR to use the TCG_compactLogHashEvent whenever possible. Since it involves measuring a buffer, I don't think it's a good idea to measure files using this method, so I have preserved the TPM_Extend and TCG_hashLogEvent method for measuring files. However, I have changed it so that appending to the measurement log is not a fatal error, as we have a guarantee that the PCR has been properly extended through the TPM_Extend functionality.

Again, I haven't fully tested, but this should theoretically solve everyone's use case.

Thanks!

Zepmann commented 7 years ago

@johnwallace123 Indeed, PCR 12 is one of the easiest PCRs to precompute. Your patch seems to be the best of both worlds for modern computers until someone finds a way to offer a SHA-1 hash to the TPM to be added to a PCRs measurements directly with logging and without failing (or maybe until UEFI is supported one day...).

Thanks for your efforts. I will test the new patch soon-ish (which should be this weekend).

[edit] In my initial text I meant PCR 10, of course. PCR 12 is for the LUKS header.

I tested your patch with debugging enabled. Everything seems to be working fine. It is a workaround, but one that works. Something I noticed is that I also see the measurements of PCR 10 in /sys/kernel/security/tpm0/ascii_bios_measurements. In my case, they are the last measurements since I put a static, minimalistic, linear grub.cfg inside core.img and therefore PCR 11 is not used at all. In addition, all of GRUB's modules are included in core.img as well, so PCR 13 is also not used by me. Therefore, I can confirm that PCRs 8, 9, 10 and 12 are correctly logged.

[edit2] Scratch that. My precomputations for PCRs 10 and 12 are off. I have to look into this.

Zepmann commented 7 years ago

If I take your old patch and apply the modification we described earlier (change grub_fatal to grub_printf) my precomputations are correct. Your latest patch seems to make incorrect measurements for PCRs 10 and 12.

(made a new post for clarity)

johnwallace123 commented 7 years ago

That's a bummer. A couple of clarification questions if you don't mind.

Do you ever see the warning about not extending the log? Does the measurement log match the SHA1 hashes printed when running in debug mode? These should correspond to the "SHA1 of buffer" debug lines. When running the precomputation, can you find the SHA1 of the individual files (kernel, ramdisk) in the measurement log?

Thanks so much for taking so much of your time testing! You've been really helpful and understanding of my quick hacks.

johnwallace123 commented 7 years ago

@Zepmann, I think I see the bug. On line 209 and 210, it looks like I'm taking the address of a pointer instead of the pointer itself. If you remove the & in front of the buffer variable on those 2 lines (where we're preparing for the compactLogHashEvent call), do your precomputation match?

Also, without this change, do you get different PCR values on successive boots?

Zepmann commented 7 years ago

I made the mentioned changes in tpm_kern.c. The result is that the data for PCR 12 is now correctly measured. However, PCR 10 is still not valid. In addition, the PCR value changes after every reboot. Therefore I guess (not tested) that previously the invalid value of PCR 12 was also different after every reboot.

Of course, the data that is/should be measured for PCR 10 (in my case the kernel and initramfs) stays the same across reboots.

johnwallace123 commented 7 years ago

@Zepmann: I concur that the proposed fix is insufficient and that the PCRs are inconsistent across boots.

PCR 12 may be correct if it is using the TPM_Extend function rather than the TCG_compactLogHashEvent. I suspect that the issue lies in the fact that we're converting a pointer into a "segment:offset" representation, and I'm probably messing it up.

@neusdan: Sending a pointer to the TPM via assembly is a bit out of my wheelhouse. Do you know how to convert these pointers correctly? Or can you point me in a good direction?

johnwallace123 commented 7 years ago

I think I may have a solution that should work for everyone. I hope to have a test of this updated PR tomorrow morning, but the basic premise is:

  1. Hash a buffer
  2. Attempt to log the event (format 2)
  3. If failed, attempt to log the event (format 1, hashing in TPM)
  4. If still failed, and logging isn't 100% necessary, just do a TPM_Extend and ignore logging
  5. If still failed and logging IS 100% necessary, fall back to the compactLogHashEvent, enforcing a maximum buffer size of 4096 bytes.

I don't like the maximum buffer size, but when I attempted to use a dynamically sized array on the stack, I got a compile error (-Werror=vla). If there are any better ways of using an arbitarily sized array on the stack, I'd love to know.

johnwallace123 commented 7 years ago

@Zepmann, @neusdan: Thanks for your infinite patience. I believe that I have a working solution in this PR, finally! I can confirm that all of the hashes are computed correctly for me (I precompute PCR 8,9,10,11 and take 13 as gospel).

Event logging for commands are in place via TCG_compactLogHashEvent if TCG_hashLogEvent fails, but events are not logged for buffers/files if logging the event fails. For buffers/files, the PCR is always extended using the TPM_Extend regardless of the status of event logging.

neusdan commented 7 years ago

Thanks for all your work here. I need some time to review and test everything.

geedo0 commented 7 years ago

Hello, just got back on this and noticed this PR and tried out this patch on our Lenovo server and it unfortunately does not address #53. I'm seeing the same behavior with the PC getting stuck inside the BIOS interrupt call. I haven't looked at the patch in detail yet, but I like the idea of falling back on compactHashLogExtend. Would it be possible to add a compiler define to force this fallback as a workaround? There's really no way to fail over in my case because I basically lose the CPU to the BIOS.

Here's where my server gets stuck (No GRUB fatal or anything): image

johnwallace123 commented 7 years ago

@geedo0 The most recent patch now includes as an option the compiler flag TGRUB_NOEVENTLOG, which if enabled, will completely avoid any TCG_hashLogEvent calls. Note that you'll completely lose any logging for everything except strings, as the TCG_compactLogHashEvent seems to require that the buffer be defined on the stack, which could be a tall order for an arbitrarily sized buffer.

Let me know if this works for you.

geedo0 commented 7 years ago

I just tried it out with the TGRUB_NOEVENTLOG flag and was able to successfully boot into CentOS with measurements in the TrustedGRUB2 PCRs. It is noticeably slower than before, but at least it works. I was surprised that the 4k limit on buffers didn't kill my GRUB; hopefully, this'll hold us over until we hit that though.

Thanks again @johnwallace123!

johnwallace123 commented 7 years ago

@geedo0 I'm glad to hear that this meets your use case! I'm not entirely surprised that it's substantially slower than before for a couple reasons. The main reason is that for strings, we're hashing twice, once in the GRUB code (to attempt the TCG_hashLogEvent), and once again in the TCG_compactLogHashEvent. It's the latter that's likely contributing to the time, because the TPM is a fairly slow resource, especially compared to the CPU. To avoid the slowness, my suggestion would be to not reboot too often :).

As to the 4K limit, that really only applies to strings. For general buffers, I've not made logging mandatory, because I know that they can get quite large, especially when logging the kernel. I would be concerned about attempting to put those potentially large buffers on the stack in order to be logged. However, for the command line strings, the logging is absolutely essential (at least for my use case), but I can be fairly confident that they're fairly small.

Thanks again for the feedback!

fchiacchiaretta commented 7 years ago

I can confirm this fixes boot on Lenovo X3650 M5. I built with TGRUB_NOEVENTLOG flag.

Thank you all for your great work!

securitykernel commented 2 years ago

Thanks for your contribution! Unfortunately, we decided to deprecate and no longer maintain this project. I will be closing this issue.