OP-TEE / optee_os

Trusted side of the TEE
Other
1.57k stars 1.06k forks source link

Attestation RFC/Discussion #3189

Closed dmcilvaney closed 4 years ago

dmcilvaney commented 5 years ago

Hey everyone,

Sorry for the long post (I've tried to keep it interesting).

Feel free to comment/ask questions about anything here (some specific questions at the bottom).


I've been going through the process of updating our OP-TEE to 3.6.0 in preparation for some pull requests with features we have been working on. While I'm working on them, I figured I would touch base and get some feedback before polishing everything up.

This is sort of a RFC, but mostly I want to start a bit of dialogue with you and some of our security experts.

The Goal (Attestation)

We are currently implementing attestation (something I've seen a few questions about here before, like #3057) and firmware resiliency. Specifically, allowing a TA to provide a certificate chain rooted all the way down to a hardware root of trust.

i.e. SoC/ROM <- SPL <- OP-TEE <- User TA

This work is in service of a Trusted Computing Group specification: Hardware Requirements for a Device Identifier Composition Engine

See also: Cyber Resilient Technologies group

Some use cases

Basic Attestation

Obviously knowing if a device has been compromised with unknown firmware is very useful. If there is malicious firmware on the device the certificates will not match any known good versions and external systems can refuse to communicate with the TA.

TA Policy

A future feature we are very interested in: TA policy. In a large scale deployment of IoT devices it is important to have control of which TAs are allowed to run on a given device:

The computer controlling a robot arm should only run the robot arm TA, nothing else... but the factory has hundreds of identical devices. Safety critical systems should be locked down to only expected TAs, even if they all have the same owning entity as other devices on site.

The policy can also be included in the attestation, allowing the device to attest to its own current policy.

Secure Communications

The certificate chains can be used to setup secure communication channels with external devices or the cloud.

Boot Flow

The 30,000-foot view of the process is:

Note: Each loading stage is secured either with NXP's HAB, or a signing mechanism built into the previous firmware.

SPL Runs

  1. The ROM code loads an immutable boot loader (currently SPL, would be nice to offload some of this in the future so we can patch SPL as well)
  2. SPL acquires a secure, unique HW ID (We are working on NXP devices with a CAAM, so the OTPMK is our choice).
  3. SPL hides that ID, obscuring it from all future firmware (requires hardware support)
  4. SPL generates an identity based on this ID + the measurement of itself: --> Compound Device ID (CDI)
  5. SPL generates a key pair based on the CDI: --> SPLPub/Pri
  6. SPL starts a certificate chain somewhere in memory and signs its own certificate with SPLPri

OP-TEE Loading

  1. SPL verifies and loads the OP-TEE binary, measuring it as it does so.
  2. SPL creates a certificate describing the OP-TEE binary and signs it with SPLPri
  3. SPL takes its private identity (CDI), and hashes it together with the OP-TEE measurement. It then generates a new key-pair for OP-TEE: --> OP-TEEPub/Pri
  4. SPL destroys the CDI and SPLPri
  5. SPL boots OP-TEE, passing OP-TEEPub/Pri in a secure manner Would like some feedback here, see below

OP-TEE Runs

  1. OP-TEE now has its own key-pair OP-TEEPub/Pri. Each time a TA is loaded a hash is generated by hashing the TA binary and each of its dependencies
  2. A PTA is made available which can take that measurement, and sign it with OP-TEEPri (OP-TEE is unique, it must keep its private key to allow it to sign additional TA certificates)
  3. The PTA can also provide the entire certificate chain back to the rich OS for general purpose attestation of the system state

Manufacturing

For production devices the root of the certificate chain needs to be recorded by a trustworthy entity in a secure environment (i.e. manufacturer like NXP), and then cross-signed. This allows a 3rd party to determine if a certificate chain is valid or not.

More reading

Trusted Cyber-Physical Systems (TCPS) - High level goals Cyber Resilient Platforms/Systems (CyReP/CyRes) See especially Device Identity with DICE and RIoT - Technical details on identity derivation etc. NIST 800-193 - Guidelines we are trying to meet for resiliency

Our changes / Questions

Our initial implementation was based on 3.4.0, and there have been some significant updates to the TA loading processes since then. As I had to re-work the flow a bit, I figured now was a good time to get some input.

Measurement

I put together a commit with just the measurement portion here: https://github.com/dmcilvaney/optee_os/commit/6d5168ae169818901af8b3013b8f67e4ac724f60 Currently it only targets user TAs loaded from the REE FS since that was our primary use case. I had to re-work it to mesh with the new ldelf changes, but it looks like its running fine with QEMU for both buffered and normal loads now.

  1. Does this approach seems scalable/maintainable going forward? We would like to upstream as much as possible.
  2. @jforissier I noticed https://github.com/OP-TEE/optee_os/pull/3181 is in the works, I think ideally we would like the fingerprint of the TA to include any shared libraries it's using. Do you see any issues with that?

Certificate Chain

We also have a certificate chain management PTA which is responsible for consuming the measurements and providing the attestation information when requested. You can see the old 3.4.0 version here: https://github.com/ms-iot/optee_os/blob/ms-iot-security/core/arch/arm/pta/pta_cyres.c It requires an external dependency and I'm not sure how well received that would be (RIoT identity derivation and crypto package). I haven't gone through to clean it up yet, but I can try and answer any questions about it.

  1. How should we pass secret data to OP-TEE from SPL? Currently we set an address with a CFG flag. We were thinking of having the device tree point to a memory address which OP-TEE could check for its keys, then clear.
  2. Is there a strong pushback against external dependencies in OP-TEE? The RIoT repo does the ECC crypto and x509 certificate handling needed to generate the attestation. We use it across multiple firmwares to avoid code duplication.
  3. With a supplicant available in U-Boot it might be interesting to use OP-TEE to do the heavy lifting for attestation across all firmware layers (with the exception of SPL), thoughts?

RPC

We have some additional RPC features, but I'll leave that for another day.

jenswi-linaro commented 5 years ago

Hi @dmcilvaney

(First 1.) The patch with the hash could be reworked and merged with the already present "tag" feature in a way that it can suit both use cases. (First 2.) A difference with dlopen() and friends is that the loading and linking is done late after the TA has started to execute. The libraries could be loaded in different order each time since it's the TA itself that is in control of when a certain library is to be loaded.

1: Passing the secret data via an address in device tree scales well 2: Yes, it (or relevant parts) would have to be imported into the optee_os git 3: Makes sense

dmcilvaney commented 5 years ago

Thanks for the feedback @jenswi-linaro,

The tag field looks like it will work well. This was originally written back in OP-TEE 3.2.0 - 3.3.0 era, and I missed that addition when updating to 3.6.0. I reworked the code to use the tag field instead of an additional hash field.

Appologies for the basic questions about the binary loading, its not an area I've spent much time: Where are the binaries used by dlopen() etc comming from? It looks like they are loaded roughly the same way as a TA. (ie as long as the UUID matches and its signed, load it from disk via sys_open_ta_bin() ). This obviously makes the idea of attestation much more complex. The obvious solution is to make the features mutually exclusive but I see that being a massive headache going forward.

Would submodules still be considered too external? There are some concerns from the maintainers about code fragmentation of the identity derivation code. Obviously this isn't a show stopper, just a pain point if we have to maintain duplicate code across 3 or 4 different fimware repos.

vchong commented 5 years ago
  1. SPL destroys the CDI and SPLPri

@dmcilvaney Can you explain this a bit more please? Especially SPLPri, is it not needed at all anymore down the line?

  1. Is there a strong pushback against external dependencies in OP-TEE? The RIoT repo does the ECC crypto and x509 certificate handling needed to generate the attestation. We use it across multiple firmwares to avoid code duplication.

2: Yes, it (or relevant parts) would have to be imported into the optee_os git

@jenswi-linaro Don't we have mbedtls for this? Also, aren't we removing something similar to this in the work to refactor the KMGK PTA to handle certificates in the user mode TA using user mode mbedtls instead of in the PTA itself?

jenswi-linaro commented 5 years ago

Where are the binaries used by dlopen() etc comming from?

They are signed as TAs and matched by UUID, just as you noted. I agree, it seems tricky with attestation combined with dlopen() and friends.

Would submodules still be considered too external?

Yes, I'm afraid so.

jforissier commented 5 years ago

Where are the binaries used by dlopen() etc comming from?

They are signed as TAs and matched by UUID, just as you noted. I agree, it seems tricky with attestation combined with dlopen() and friends.

What's the problem? As long as all the binaries are fingerprinted, trustworthiness can still be attested, no?

jenswi-linaro commented 5 years ago

Don't we have mbedtls for this? Also, aren't we removing something similar to this in the work to refactor the KMGK PTA to handle certificates in the user mode TA using user mode mbedtls instead of in the PTA itself?

@vchong, the policy is to put stuff in user mode rather than kernel mode if possible. This is different, for one thing stuff is done during early boot if I understand it correctly. If it turns out that it just as well could be done in user mode, then I guess that's where it will end up.

dmcilvaney commented 5 years ago

@jforissier

Where are the binaries used by dlopen() etc comming from?

They are signed as TAs and matched by UUID, just as you noted. I agree, it seems tricky with attestation combined with dlopen() and friends.

What's the problem? As long as all the binaries are fingerprinted, trustworthiness can still be attested, no?

I feel like the major issue is roll back attacks and the like.

Say you have a TA which dynamically loads a library. You establish a secure connection to it, and decide you trust it via attestation, and pass it sensitive data.

Now that TA goes and attempts to load a library via dlopen(), what binary is it going to load? From an external point of view we only know that the library has a valid signature according to OP-TEE, nothing more. What if we discovered a major vulnerability in an earlier version and replaced the binary on the devices, but someone manages to roll one back. Or maybe one of the devices fails to update.

Even if you update the attestation certificate as each library is loaded, how do you inform everyone who used the old certificate to establish trust that it is no longer valid?

Obviously you can design your TA to force all the libraries to load before you establish a connection, but that doesn't seem reliable.

The other option is to limit OP-TEE to only load a specific, known set of libraries. A hash of each library would need to be stored as part of the OP-TEE kernel. That way the attestation of OP-TEE would capture the identity of these libraries.


@vchong

SPL destroys the CDI and SPLPri

@dmcilvaney Can you explain this a bit more please? Especially SPLPri, is it not needed at all anymore down the line?

The idea is to only pass a signed certificate along. In the future we would like to offload this to an external system (TPM for example) so that even the current binary never knows its private key. If no one knows the private key, no one can spoof the attestation certificate.

The attestation is a chain of certificates, each certificate signed by the previous stage of firmware. So SPLPri is used by SPL to sign the certificate attesting to OP-TEE. We don't want anyone else to be able to fake the OP-TEE certificate, so the safest thing to do is destroy the private key as soon as we are done measuring OP-TEE.

Since the key derivation is deterministic, based on a hidden hardware seed and code measurements, we should always get the same signing keys if we are using the same firmware. If the keys change, the attestation will change, and external systems will know not to trust the device.


@jenswi-linaro @vchong

Don't we have mbedtls for this? Also, aren't we removing something similar to this in the work to refactor the KMGK PTA to handle certificates in the user mode TA using user mode mbedtls instead of in the PTA itself?

@vchong, the policy is to put stuff in user mode rather than kernel mode if possible. This is different, for one thing stuff is done during early boot if I understand it correctly. If it turns out that it just as well could be done in user mode, then I guess that's where it will end up.

I would have to look through mbedtls and see if anything is missing, but I have no ideological issues if that is the way we need to go. Microsoft Research had the self-contained DICE/RIoT libraries already running so we went with that across all of our firmware layers so we could make sure all the certificate code was the same.

As for user mode vs kernel mode PTA, I think a PTA would be a better pick:

  1. We need to pick up the secrets from previous firmware layers, likely stashed in SRAM somewhere. I believe this is most easily done from the kernel/PTA?
  2. In the future we see this being most secure if handled by a hardware TPM, but that shouldn't be a requirement to get value. So we need a system that can either handle it internally, or talk to a TPM via SPI if one is available. I think it would be best to limit access to the TPM to kernel mode?
  3. Not sure how easy it would be to pass both the measurements of a newly loaded TA, and OP-TEE's private key, to a user TA.
jforissier commented 5 years ago

@dmcilvaney

@jforissier

Where are the binaries used by dlopen() etc comming from?

They are signed as TAs and matched by UUID, just as you noted. I agree, it seems tricky with attestation combined with dlopen() and friends.

What's the problem? As long as all the binaries are fingerprinted, trustworthiness can still be attested, no?

I feel like the major issue is roll back attacks and the like. [...]

OK, I see your point.

The other option is to limit OP-TEE to only load a specific, known set of libraries. A hash of each library would need to be stored as part of the OP-TEE kernel. That way the attestation of OP-TEE would capture the identity of these libraries.

Yes, and regardless of rollback, some kind of library list would be useful for another reason. With the introduction of the user-space loader, we opened a possibility for any TA to load the binary of any other TA or library. dlopen() makes it even easier. This could pose problems regarding code confidentiality, or code usage.

vchong commented 5 years ago

@dmcilvaney Ok. Thanks for the detailed explanation!

dmcilvaney commented 5 years ago

@jenswi-linaro, @jforissier (and anyone else with an interest)

I'm staring to spec out and plan this work, I've got a few questions here. (Ideally, I would like this to be upstreamable and valuable for everyone).

What is the decision making process for splitting kernel behaviour between the kernel itself and a PTA? At a glance it seems like the kernel closely matches the Global Platform spec, while additional functionality is added to the PTAs. Is that the only factor?

I'm looking at implementing some more advanced management functionality (see TA Policy from my initial message for one) with an eye to making them available upstream.

Enhanced Storage PTA

While storage which is tied to a TA's UUID is useful, sometimes it is desirable to lock the storage to a specific TA binary. Also useful is the ability to roll encrypted storage forward between TA versions.

My current thought is to implement a PTA which can take a buffer and either encrypt or decrypt it based on the above. Would this be better received as a modification to the kernel (ie expand syscall_storage_obj_open() with new storage types), or as a PTA? In the case of the PTA, I assume storing the resulting encrypted blob would be the responsibility of the existing persistent storage APIs. The storage PTA might offer:

Policy PTA

We want to hook into things like a discreet TPM (see #3219 ), or pull certificate chains from earlier firmware stages (SPL, ATF, etc.), and allow OP-TEE to create attestation certificates for its TAs.

Obviously some of this doesn't belong in upstream OP-TEE (For example the TPM TSS stack alone is about a third the size of all of optee_os), but I want to layout a stable interface which TAs can hook into and retrieve attestation information. A platform could then override those weak implementations with more advanced versions as needed.

i.e. The Policy PTA might have a weak implementation of a function ta_verify(), which calls shdr_verify_signature() etc. replacing the calls in ree_fs_ta.c. But if the device has a discreet TPM, it might be desirable to add the TSS library and utilize the TPM to decide which TAs are allowed to run (TPMs have a very full-featured policy management story).

Would moving TA validation to a PTA, along with new attestation functionality, be acceptable? The PTA would likely offer the following functionality:

jenswi-linaro commented 5 years ago

Adding/extending syscall vs. PTA: We try to avoid adding new syscalls and instead use a PTA. This is a bit like syscall vs. ioctl in the Linux kernel.

A PTA is foremost an API. PTAs has a tendency to become a module where a feature or set of features are implemented. It's perfectly fine to have the implementation outside the PTA (if a PTA is needed at all).

Enhanced Storage PTA: sounds interesting, adding new storage types seems natural.

Policy PTA: I'm not sure a PTA is needed for this, it depends on if some new interface is needed or not. Some nice hooks and perhaps some not too complicated reference code for this sounds interesting.

github-actions[bot] commented 4 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.