confidential-containers / trustee

Attestation and Secret Delivery Components
Apache License 2.0
61 stars 81 forks source link

[RFC] General Attestation Service Design Proposal #215

Closed liangzhou121 closed 2 years ago

liangzhou121 commented 2 years ago

Design

According to [RFC] Generic Key Broker Service (KBS) & Attestation Service high level architecture proposal, CC's general Attestation system relies on Attestation Service and KBS to verify the Attestation-Agent's running environment and distribute critical messages(such as KEK). The KBS is the role of RATS Relying Party, it's hardware agnostically and focuses on vendor-specific requirements. The Attestation Service is an implementation of the RATS Verifier role, it focuses on the verification of Evidence's identity and TCB status.

And the CC's Attestation Service is a general infrastructure service that mostly focuses on providing a Evidence verification service to KBS. So it needs to be compatible with all CC-supported HW-TEEs and should have excellent scalability to support potential new HW-TEE in the future. It also can be compatible with existing third-party attestation services with modularization.

To support the verification of TCB status, Attestation Service has the capability of parse the received Evidence to extract its corresponding TCB status. At the same time, it also includes the Policy Engine component which is an implementation of RATS Verifier Owner, It relies on the reference data from Reference Value Provider and the info extracted from Evidence to match the Attestation-Agent's running environment's TCB status dynamically. And generate the corresponding Attestation Results.

Attestation Service supports different deployment modes, as a service or as a library. One Attestation Service maybe needs to support several KBS services simultaneously when it is deployed as an infrastructure service. This will enhance the flexibility of the service's deployment.

Goals

Non-Goals

Architecture

The following diagram demonstrates the overall architecture and internal modules: image

Service Module

This module is used to communicate with external modules such as KBS, its mainly features are:

Evidence

Different types of HW-TEE Evidences are various, as example:

Attestation Results

The Entity Attestation Token (EAT) format compliant Attestation Results of the Evidence. Its format should be:

{
    "alg": "RS256",
    "jku": "https://xxxxx.xxx/certs",
    "kid": <self signed certificate reference to perform signature verification of attestation token,
    "typ": "JWT"
}.{
    "exp": 1568187398,
    "iat": 1568158598,
    "iss": "https://xxxxxx",
    "attestation_results":{}
}.[Signature]

The attestation_results of different types of HW-TEE are various, as examples:

As a Library

The Attestation Service will provide the following interface:

pub fn attestation(evidence: String) -> Result<String, String> {}

attestation()

As a Service

The Attestation Service can be extended to add a separate service application which also relies on the upper library. Its Cargo.toml file example:

[[bin]]
name = "attestation_service"
path = "app/attestation_service/src/main.rs"

Note: The service can be connected with localhost:port only in POC stage.

The POC will select the gRPC, so its proto file should be:

syntax = "proto3";

package attestation;

message AttestationRequest {
    bytes evidence = 1;
}
message AttestationResponse {
    bytes results = 1;
}

service AttestationService {
    rpc Attestation(AttestationRequest) returns (AttestationResponse) {};
}

Attestation Module

Its responsibility is to verify these various received Evidence. It includes the following two internal modules:

It also defines a trait that needs to be implemented by its internal modules.

trait Attestation{
    fn attestation(&self, evidence : String) ->Result<String, String>;
}

Verification Drivers

It includes different HW-TEE specific pluggable verification modules and some common modules to verify the Evidence's identity and TCB status:

TDX/SGX/SEV(-ES)/SEV-SNP

Its features:

Eventlog-rs

To verify all running programs' measurements inside TEE-based VM, the TCG protocol compliant eventlog will be included inside Evidence. So the eventlog-rs is used to parse that eventlog and extract the following components' measurements which will be sent to Policy Engine for final verification (as an example):

Note: Only the TDX supports eventlog currently.

Policy Engine

It uses Open Policy Engine to reach Develop Policy as Code. And OPA relies on the following files to execute verification:

Policy.rego

The different HW-TEE TCB status's verification policy is various, as examples:

import future.keywords.in default allow = false allow { bootloader_is_granted kernel_is_granted kernelparameters_is_granted }

bootloader_is_granted { count(data.bootloader.hashes) == 0 } bootloader_isgranted { input.bootloader == data.bootloader.hashes[] } kernel_is_granted { count(data.kernel.hashes) == 0 } kernel_isgranted { input.kernel == data.kernel.hashes[] } kernelparameters_is_granted { count(data.parameters.hashes) == 0 } kernelparameters_isgranted { input.parameters == data.parameters.hashes[] }

- `Policy.rego` used to check SGX Evidence's TCB information:

package policy

By default, deny requests.

default allow = false

allow { mrEnclave_is_grant mrSigner_is_grant input.productId >= data.productId input.svn >= data.svn }

mrEnclave_is_grant { count(data.mrEnclave) == 0 } mrEnclave_isgrant { count(data.mrEnclave) > 0 input.mrEnclave == data.mrEnclave[] }

mrSigner_is_grant { count(data.mrSigner) == 0 } mrSigner_isgrant { count(data.mrSigner) > 0 input.mrSigner == data.mrSigner[] }


### Proxy
It includes pluggable proxy sub-modules that are used to support third-party Attestation Services, such as:
- Microsoft Azure Attestation Service
- ISecL Attestation Service

## Management API Module
`Attestation Service` also enables a management API to provide the following functionalities:
- Set Policy Engine's reference data by `Reference Value Provider`.
- Set proxy sub-modules configurations.

So this module provides the following interfaces:
```RUST
fn set_policy(request: &SetPolicyEnginePolicyRequest) -> Result<(), String> {};
fn export_policy(request: &ExportPolicyEnginePolicyRequest) -> Result<PolicyEnginePolicy, String> {};
fn set_reference(request: &SetPolicyEngineReferenceRequest) -> Result<(), String> {};
fn export_reference(request: &ExportPolicyEngineReferenceRequest) -> Result<PolicyEngineReference, String> {};
fn test(requets: &TestPolicyEngineRequest) -> Result<TestPolicyEngineResponse, String> {};

As a Service

If the Attestation Service is built as a service, it will provide these functions as GRPC services. So its proto file should be:

syntax = "proto3";

package managementapi;

message SetPolicyEnginePolicyRequest {
    bytes name = 1;
    bytes content = 2;
}
message SetPolicyEnginePolicyResponse {
    bytes status = 1;
}

message SetPolicyEngineReferenceRequest {
    bytes name = 1;
    bytes content = 2;
}
message SetPolicyEngineReferenceResponse {
    bytes status = 1;
}

message ExportPolicyEnginePolicyRequest {
    bytes name = 1;
}
message ExportPolicyEnginePolicyResponse {
    bytes status = 1;
    bytes content = 2;
}

message ExportPolicyEngineReferenceRequest {
    bytes name = 1;
}
message ExportPolicyEngineReferenceResponse {
    bytes status = 1;
    bytes content = 2;
}

message TestPolicyEngineRequest {
    bytes policyname = 1;
    bytes policycontent = 2;
    bool policylocal = 3;
    bytes referencename = 4;
    bytes referencecontent = 5;
    bool referencelocal = 6;
    bytes input = 7;
}
message TestPolicyEngineResponse {
    bytes status = 1;
}

service PolicyEngineService {
    rpc SetPolicy(SetPolicyEnginePolicyRequest) returns (SetPolicyEnginePolicyResponse) {};
    rpc exportPolicy(ExportPolicyEnginePolicyRequest) returns (ExportPolicyEnginePolicyResponse) {};
    rpc setReference(SetPolicyEngineReferenceRequest) returns (SetPolicyEngineReferenceResponse) {};
    rpc exportReference(ExportPolicyEngineReferenceRequest) returns (ExportPolicyEngineReferenceResponse) {};
    rpc Test(TestPolicyEngineRequest) returns (TestPolicyEngineResponse) {};
}

Opens

Reference

fitzthum commented 2 years ago

Looks good. A few notes, mainly for future reference.

First, eventlog-rs won't be used by the SNP module because the SNP evidence doesn't include any kind of event log (although at some point we might be using vTPMs with SNP, which could provide something like that). Instead the evidence will include some other reference to the firmware, kernel, etc. that the VM booted with. It's not yet clear exactly what this will look like.

I don't think we want to have a boolean verification state. Will rego allow us to return a verification result that can be set to more than just two values? Similarly can the allow field in the verification result be changed to something like verification_result which could be set to more than just two different values. This would allow us to create more expressive key policies. For example, we could make a policy that only releases keys when a new version of firmware is used in the guest. It would also help reconcile different architectures where approved evidence might come with different security properties.

Finally, maybe we should think about having a Keylime attestation proxy?

liangzhou121 commented 2 years ago

@fitzthum Thank you for your comments, the following are my answers.

First, eventlog-rs won't be used by the SNP module because the SNP evidence doesn't include any kind of event log (although at some point we might be using vTPMs with SNP, which could provide something like that). Instead the evidence will include some other reference to the firmware, kernel, etc. that the VM booted with. It's not yet clear exactly what this will look like.

I think it's not a problem here. And maybe a new general separate libray which is compatible with both TDX eventlog and SEV-SNP should be added in future. But I am not sure about this currently.

I don't think we want to have a boolean verification state. Will rego allow us to return a verification result that can be set to more than just two values?

Yes, the OPA can return multiple evaluation values simultaneously. It's all depends on the policy.rego file. You can play this OPA example to have a try, it will output the following:

{
    "allow": true,
    "user_is_admin": true,
    "user_is_granted": []
}

Finally, maybe we should think about having a Keylime attestation proxy?

Yes, the design is modularized, so it's easy to extend to add some other proxy sub-modules. I think in the POC stage, maybe none proxy sub-module will be supported actually.

vbatts commented 2 years ago

wouldn't the message format for SEV(-ES) and SEV-SNP need to considered in the goals now, for it to be accommodated later?

fitzthum commented 2 years ago

I think the assumption is basically that SEV-SNP is going to be similar enough to TDX (with some of the differences I mentioned above) that it should work with this proposal. On the other hand SEV-ES is different enough that it is never going to be a great fit so we will either just try to shoe-horn it in later or leave it to simple-kbs to support pre-attestation.

I know there was some interest in discussing SNP support more generally. We might want a separate issue for that.

One thing that I alluded to above is that SNP does not have the same kind of event log that TDX does. I am assuming that for SNP we are going to continue to use the measured-direct-boot approach that we are currently using for SEV(-ES). To simplify validation of the measurement we're going to need to provide some hints to the KBS/AS as part of the evidence. For example, we will probably want to send the hash of the kernel and initrd as part of the evidence. We will also probably need to include the CPU count of the guest so that the AS can construct an appropriate VMSA. These values can be evaluated using a rego policy. If the policy is valid we will construct an expected measurement from that and compare it to the measurement in the attestation report.

One interesting thing is that the AS will likely need access to the full firmware binary so that it can calculate the launch measurement (unless we try to pre-compute launch measurements offline). This is a bit annoying. The proposal above has some tooling to provide reference data to the AS. It looks like this is more aimed at providing hash, but we would probably use it to provide the full firmware binary as well.

So like I said, I believe that this proposal will work with SNP, but if someone wants to try filling in more details that would definitely be welcome.

liangzhou121 commented 2 years ago

Hi @vbatts Thank you for your comments.

wouldn't the message format for SEV(-ES) and SEV-SNP need to considered in the goals now, for it to be accommodated later?

I think the SEV-SNP should be considered now, we should support these runtime attestation simultanesously. And for SEV(-ES), as I mentioned in the OPEN section, maybe we can consider it later, because it has been supported by simple-kbs.

liangzhou121 commented 2 years ago

Hi @fitzthum ,

One thing that I alluded to above is that SNP does not have the same kind of event log that TDX does. I am assuming that for SNP we are going to continue to use the measured-direct-boot approach that we are currently using for SEV(-ES)

OK. And according to the design, each type of HW-TEE has its own corresponding module, so it can implement it's special and separate .rego file (such as policy-snp.rego) and the verification logic. So I think it's not a problem here.

One interesting thing is that the AS will likely need access to the full firmware binary so that it can calculate the launch measurement (unless we try to pre-compute launch measurements offline). This is a bit annoying.

Do you know how to calculate the SNP launch measurement? As example, the launch measurement includes: kernel, bootloader, firmware hashes. So the measurement is calculated simplify by: kernel hash + bootloader hash + firmware hash ?

fitzthum commented 2 years ago

Do you know how to calculate the SNP launch measurement?

Calculating the launch measurement for SNP is actually a bit more complicated than calculating the launch measurement for SEV-SNP. There is a reference implementation here and @dubek can probably answer any specific questions.

The basics are that instead of just hashing flash0 + VMSA, we actually hash every memory page individually, with each page extending the hash of the previous page. There are also some new mechanisms with different types of pages. The underlying memory that we load into flash0 and the VMSA are roughly the same as with SEV(-ES) though. So just like for SEV(-ES) we will need to know the hash of the initrd, kernel, and cmdline so that we can inject them into the firmware (this is how measured-direct-boot works). We will also need to generate the VMSAs (initial register state of guest) and we'll need to know a few things to do this like the number of CPUs the guest is running.

liangzhou121 commented 2 years ago

Calculating the launch measurement for SNP is actually a bit more complicated than calculating the launch measurement for SEV-SNP. There is a reference implementation here and @dubek can probably answer any specific questions.

@fitzthum Thank you for this information. And I also updated the proposal to mention that only TDX supports the eventlog mechansim currently.

knrt10 commented 2 years ago

@liangzhou121 thank you for the proposal, it looks great. I have a question regarding the Attestation Results as proposed we are going with Entity Attestation Token(EAT) format. Is there any particular reason for using JWT as typ rather than CBOR Web Token (CWT). JWT is not inherently secure. It doesn't concern it with encryption, it only care about validation.

liangzhou121 commented 2 years ago

@knrt10 Thank you for your comments.

Is there any particular reason for using JWT as typ rather than CBOR Web Token (CWT). JWT is not inherently secure. It doesn't concern it with encryption, it only care about validation.

Actually both the JWT and CWT are recommended by RATS that can be used to transfer the Attestation Results. And the Attetsation Results only needs the integrity protection, so we think the JWT should be enough. By the way, none of the JWT/CWT will be used during the POC stage. Because the attestation service will be used as a local service or a Crate which is intergrated by KBS directly in this stage.

jodh-intel commented 2 years ago

@liangzhou121 - can I confirm that "MAA" refers to "Microsoft Azure Attestation" in the diagram?

liangzhou121 commented 2 years ago

@liangzhou121 - can I confirm that "MAA" refers to "Microsoft Azure Attestation" in the diagram?

Yes, it means the "Microsoft Azure Attestation", I refer it here to show as an example that the AS can be extended to support potential third-party Attestation Service.

jodh-intel commented 2 years ago

Thanks @liangzhou121.