Workload Identity in Attestation Results

thomas-fossati commented 1 year ago

@gkostal 09/12/2023 SIG meeting:

How is "identity" represented for an attested environment? Can it be generalized?

Especially if the attested environment is a confidential compute attested environment.
And in a way that's tractable for a relying party to authorize against.

@gkostal 10/10/2023 additional details:

I am looking for an abstraction (similar to what AR4SI does for trustworthiness of an attested environment) for the "code identity" in an attested environment that:

can be expressed in attestation results
can be referenced simply by relying parties
is independent of underlying TEE technology
is stable over time (i.e., OS updates, new builds of application executable/binary/container, etc. do not change "code identity")

In essence, I'd like to look at this from the relying party perspective and figure out what's the ideal model for them, and then work backwards to see if/how it's implementable. For example, I could envision a relying party wanting to express a "code identity" as something like "The secret formula application authored by Coca-Cola" versus "The secret formula application authored by Pepsi".

thomas-fossati commented 1 year ago

How is "identity" represented for an attested environment?

@gkostal there may be a couple of different (entangled) identities involved:

Workload identity (e.g., a public key bound to the static and possibly dynamic state of the workload)
Platform identity (e.g., DICE chain, IDevID, EAT's UEID/SUEID and associated public key (see Appendix C of EAT for a partial comparison.)

Both need to be endorsed by the relevant (typically different) supply chain entit{y,ies}.

Are you thinking of one of these two identities in particular? Or their combination? Or something else?

gkostal commented 1 year ago

I believe the former.

I'm thinking of the "identity" from the perspective of a relying party that needs to express a local "appraisal policy for attestation results" that says something like "it's OK to release my secret to the attested environment that's running workload X" where X is a well-known software component.

As you point out there can be a distinction between "identity" being just the code that's running, or "identity" being the code that's running plus its dynamic state (e.g. running on behalf of Coca-Cola, running on behalf of Pepsi, etc.).

I believe EAT suggests the software manifests claim ("manifests") for this purpose. Off the cuff, this seems extremely unwieldy to use for authorization policy in a relying party. Ideally, the relying party captures a policy like "I trust component X running on behalf of party Y" or some such. As the software changes within expectation (e.g., new builds of X are allowed, new dependent libraries brought in by X are OK, etc.), the relying party policy doesn't need to change. IOW, the hash of all the binaries in a protected environment may be an "identity" per definition but it's potentially not an "identity" that works very well for real world remote attestation policy in a relying party.

thomas-fossati commented 1 year ago

I believe the former.

I'm thinking of the "identity" from the perspective of a relying party that needs to express a local "appraisal policy for attestation results" that says something like "it's OK to release my secret to the attested environment that's running workload X" where X is a well-known software component.

OK, thanks for the clarification. Maybe we should rename the issue to "Workload Identity"?

[...] Ideally, the relying party captures a policy like "I trust component X running on behalf of party Y" or some such. As the software changes within expectation (e.g., new builds of X are allowed, new dependent libraries brought in by X are OK, etc.), the relying party policy doesn't need to change. IOW, the hash of all the binaries in a protected environment may be an "identity" per definition but it's potentially not an "identity" that works very well for real world remote attestation policy in a relying party.

Yes, pure hashes are probably too low-level to be generally usable. A more abstract "version identification" claim (e.g., SVN) is easier to build policies against. FWIW we have that in CoRIM (see here), and I guess it'd be easy to back-port it to EAT.

SimonFrost-Arm commented 1 year ago

The RP definitely needs a rolled up view, but the verifier role can be involved in taking on the complexity of checking hashes and resolving (probably) a set of them to an app identity. Question is whether there should be some standardised expression of app identity which appraisal policy could produce and RPs expect? If so then what granularity can be expected - can the 'app' portion of the workload be reliably identified distinctly from the OS portion? Traditional VM models make that complicated but there is the potential for future FAAS like deployments to be more distinct. As noted above, the other part of the workload to be identified is any data bundle made available pre-attestation. CCA realm state includes a personalization-value claim intended to deliver this role (without mandated implementation). There are also proposals to keep external definitions of workloads, with a proof delivered into the environment to be presented alongside evidence e.g, https://queue.acm.org/detail.cfm?id=3623460

thomas-fossati commented 1 year ago

@gkostal, specifically on this point:

is stable over time (i.e., OS updates, new builds of application executable/binary/container, etc. do not change "code identity")

is it a signed statement from the software author (in a general sense) over a bunch of metadata associated with the software that you have in mind here?

gkostal commented 1 year ago

@gkostal, specifically on this point:

is stable over time (i.e., OS updates, new builds of application executable/binary/container, etc. do not change "code identity")

is it a signed statement from the software author (in a general sense) over a bunch of metadata associated with the software that you have in mind here?

@thomas-fossati , yes, maybe and/or no. :-)

From the relying party perspective (which is where I'm starting), what does the model look like? Ideally the only signed statement the relying party consumes is the attestation result, and the only signer they need to validate is the verification service's signing key.

From a design perspective, the only immediately obvious way I see to implement this is via something like a software endorsement as you describe. Ideally the mechanics of the software endorsement are not necessary for the relying party (e.g., they really don't need to be aware of and verify the signing key for the endorsement).

So, I foresee that there might be two levels to the reach consensus on:

the logical model/schema for describing a workload identity in attestation results
the mechanism(s) that enable a verifier to populate this logical model/schema (e.g., CoRIM based endorsement?)

thomas-fossati commented 1 year ago

So, I foresee that there might be two levels to the reach consensus on:

the logical model/schema for describing a workload identity in attestation results

there is an abundance of SBOM formats (SPDX, SWID/CoSWID, CycloneDX, in-toto, SLSA) which are worth a look as they seem to address exactly this point.

the mechanism(s) that enable a verifier to populate this logical model/schema (e.g., CoRIM based endorsement?)

(not surprisingly) +1 :-)

thomas-fossati commented 11 months ago

@gkostal a possibly related talk at LPC's CC micro-conference

thomas-fossati commented 11 months ago

It'd be good to break it down "identity shapes" by attestation scheme, i.e., show how identity is/can be represented in CCA, vTPM, DICE, TDX, SEV, etc. and see if there are common patterns that can be extracted.

gkostal commented 10 months ago

there is an abundance of SBOM formats (SPDX, SWID/CoSWID, CycloneDX, in-toto, SLSA) which are worth a look as they seem to address exactly this point.

I don't believe this is what I'm looking to discuss.

These describe a detailed single physical manifestation that will change over time as the details of the app binaries change (e.g., code changes, dependent library changes, build tooling changes, etc.). These definitions don't satisfy the needs of a relying party, specifically the requirements I mentioned earlier in the discussion (copy/pasted again here):

(met) can be expressed in attestation results
(not met) can be referenced simply by relying parties
(not met) is independent of underlying TEE technology
(not met) is stable over time (i.e., OS updates, new builds of application executable/binary/container, etc. do not change "code identity")

dcmiddle commented 10 months ago

The ACON project takes the identity of an application as the measurement of its container. https://github.com/intel/acon That measurement is independent from the underlying OS and middleware (though a chain of measurements is provided in the attestation for security). So the application identity is stable over time to changes in the layers underneath it. However changes to the application itself will be different. In order to address volatility in dependencies those artifacts can be indirectly identified by their signer, e.g., I will always trust a storage library from my CSP. In that sense dependency updates (optionally) won't change the app id.

Without relying on a specific hash of the application, an alternative is to set a policy based on the signer and a vendor supplied identifier for the application. So for an SGX attestation you would look at fields like ISVPRODID and MRSIGNER.

OR13 commented 8 months ago

Regarding "workload identity", you may find this proposed charter interesting:

https://datatracker.ietf.org/doc/charter-ietf-wimse/

gkostal commented 8 months ago

Regarding "workload identity", you may find this proposed charter interesting:

https://datatracker.ietf.org/doc/charter-ietf-wimse/

Thanks for pointing out @OR13 !

bobbiec commented 1 month ago

Here's a couple different perspectives from other worlds, that I think are more closely aligned to @gkostal 's concerns -

In the Backstage developer platform framework, there is a concept of a Component:

A component is a piece of software, for example a mobile feature, web site, backend service or data pipeline (list not exhaustive). A component can be tracked in source control, or use some existing open source or commercial software.

In SPIFFE/SPIRE (and related to WIMSE, above), there is a concept of Workload:

A workload is a single piece of software, deployed with a particular configuration for a single purpose; it may comprise multiple running instances of software, all of which perform the same task. The term “workload” may encompass a range of different definitions of a software system, including:

A web server running a Python web application, running on a cluster of virtual machines with a load-balancer in front of it.

An instance of a MySQL database.

A worker program processing items on a queue.

I want to be able to make decisions like, WebService is allowed to talk to AccountService and Database. And generally, the identity of WebService should be stable even if:

WebService adds some new API endpoints
WebService moves from Python 3.11 to 3.12 (or even, gets rewritten in Go)
WebService upgrades a dependency

And for that reason, I think it's very difficult to do this with a pure software measurement of any component. In my opinion, a useful identity is something like:

A human-readable name associated with a component/workload
Accompanying metadata about the rest (measurements, etc.)

But this does require something like an endorsement from the author.

CCC-Attestation / meetings

Workload Identity in Attestation Results #17