AA: Measurement Event Log Format

binxing commented 8 months ago

PR #467 introduces code to measure container images into a TDX RTMR during loading. However, the absence of an event log poses challenges to verifiers when images are loaded out of order. An event log is crucial for attesting to runtime events like additional process execution in existing containers and security policy changes.

To institute an event log, a consensus on the log format is imperative. We have the option to adopt an established standard log format, like the TPM2 log format outlined in the TCG PC Client Profile or the Canonical Event Log Format (CEL) defined by the TCG Infrastructure Working Group. Alternatively, we can formulate a bespoke format. Generally, opting for a standardized format is preferred to facilitate wider acceptance. It's worth noting that the TPM2 log format predominantly caters to platform firmware-oriented events, making CEL a more reasonable choice for our purpose.

CEL introduces the concepts of NELR (Native Event Log Record) and CELR (Canonical Event Log Record), and allows applications to keep event log in NELRs, and translate/convert them to CELRs at attestation time. In essence, we are free to define the NELR format according to our requirements while remaining compliant with CEL standards.

NELR (Native Event Log Record) Format

An event record consists of an event type and associated parameters. One encoding approach is TLV (Type-Length-Value), akin to the union construct in C, using a tag, usually an integer, to specify the record's structure. Alternatively, the event record can be encoded using JSON serialization. However, formats relying on integer event types are generally suboptimal, requiring a centralized entity for event type assignment. While currently not problematic, this may hinder user containers from generating customized measurement events in the future. Despite its preference, JSON introduces the challenge of "structural equivalence", where the same JSON object can have various valid encodings due to differences in field order and insignificant space characters. This inherent variability poses a challenge for verifiers uncertain about which encoding to hash. Furthermore, it's important to note that JSON lacks support for comments, complicating the conveyance of Event Log Informative Data (ELID) in CEL terminology.

Hence, a proposed text-based NELR format is presented below. Text is preferred over binary in this design due to its human readability, ease of encoding in JSON strings for transmission via RESTful APIs, and its capacity to accommodate descriptive event types/tags rather than numerical ones.

The format is succinctly described as follows:

Each NELR is a non-empty line of text terminated by a newline character. The entire line (including leading/trailing spaces/tabs but excluding the trailing newline) is hashed/extended as is.
A log file is a textual document composed of NELRs.
A log file may include empty lines (solely a newline character); these lines are excluded from hashing/extension.
Additionally, a log file may incorporate comments, identified as lines starting with # (no leading spaces/tabs allowed). Comments, serving purposes such as conveying ELID or enhancing readability, are treated in the same manner as empty lines.

NOTE: The choice of a plain text line over JSON is deliberate to eliminate variability in JSON encoding, ensuring verifiers have certainty regarding what to hash. Nevertheless, a JSON object can be encapsulated inside a NELR but all newline characters must be omitted.

The following exemplifies a log file containing NELRs (which can be locally maintained by the Attestation Agent), formatted in accordance with the previously described specifications.

#
# Lines starting from '#' (no leading space allowed) are comments and are
# treated in the same manner as empty lines.
#
# All empty lines are ignored (not hashed/extended).
#

#
# We have the option of either having one log file per Measurement Register
# (MR) or a global log for all MRs. The format definition currently assumes one
# log per MR. In the event of an agreement on an aggregated log for all MRs, we
# can prefix each line with an MR index.
#

#
# A NELR is an entire line of text without the trailing newline. All
# characters, including leading and trailing spaces and tabs, are significant
# and hashed as is.
#
# In establishing a convention, the initial NELR must explicitly serve as an
# INIT record, delineating the hash algorithm and MR value before the log
# initiation. This is imperative, as the virtual Firmware may have previously
# extended this MR and maintained a log in an alternative format before
# initiating the Kernel boot.
#
# The following INIT record indicates that the MR has not been extended by the
# virtual Firmware, and the hash algorithm employed is SHA384.
#
INIT sha384/000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

#
# The following represents an ImageLoad event, and the SHA256 digest of the
# loaded image manifest is
# 0123456789abcdef1123456789abcdef2123456789abcdef3123456789abcdef.
#
# As another convention, the first column should be a "domain identifier",
# offering insight into the entity generating the event. The use of uppercase
# "domain identifiers", such as INIT, is reserved exclusively for log keepers
# and verifiers.
#
github.com/confidential-containers ImageLoad sha256/0123456789abcdef1123456789abcdef2123456789abcdef3123456789abcdef

#
# The content of a record is indeed free-form. Therefore, if needed, JSON
# objects can be inline within the structure.
#
github.com/confidential-containers EventWithJSONParams { "key1":"value1\tmore values and 'quotes'\n", "key2": [ "value2", 2, true, null ] }

Converting NELR to CELR (Canonical Event Log Record)

The provided table outlines essential fields within a CELR per CEL information model, complete with default values and instructions on determining their specific values. The conversion of NELR to CELR involves filling in these fields, encoding NELR into a JSON string, and then assigning the resulting string to the field content.

Field	Value	Description
`recnum`	Assigned sequentially by conversion tool	This field is an integer starting from `0` for the initial CELR and incremented for each subsequent CELR.
`pcr`	`2`	On TDX, this designates the index of the RTMR to which the event is extended, default to `2`.
`digests`	SHA-384 digest of the NELR	This comprises an array of hash algorithm and digest pairs corresponding to the `content` field. TDX (and most other HW TEEs), supports only one algorithm, resulting in a single entry within this array.
`content_type`	`"CC-CEL"`	`content_type` dictates the type or structure of the `content` field. Currently, "CC-CEL" is utilized, with suggestions for alternative options welcomed. It's important to note, as per the CEL spec, `content_type` must also possess an integer value to enable binary encoding, such as in TLV or CBOR formats. The absence of an integer value restricts CELR to JSON encoding in our scenario.
`content`	NELR	This represents the NELR line encoded as a JSON string.

Below is a Bash script for converting a log file containing NELR entries into a JSON array of CELRs.

#!/usr/bin/env bash

NELR_to_CELR_json() {
    local _RECNO=0
    local _HALG _MR _J _D
    while read -r; do
        test -z "$REPLY" -o "${REPLY:0:1}" = "#" && continue

        case $(cut -f1 -d' ' <<< "$REPLY") in
            INIT)
                if test -z "$_HALG"; then
                    _MR=$(cut -f2 -d' ' <<< "$REPLY")
                    _HALG=${_MR%%/*}
                    _MR=${_MR#$_HALG/}
                else
                    echo error: Multiple INIT found >&2
                    return 1
                fi
                ;;
            *)
                if test -z "$_HALG"; then
                    echo error: First record is not INIT >&2
                    return 2
                fi
                ;;
        esac

        _D=$(echo -n "$REPLY" | ${_HALG}sum | cut -f1 -d' ')
        _J+="$(printf \
            '{"recnum":%d,"pcr":%d,"digests":[{"hashAlg":"%s","digest":"%s"}],"content_type":"CC-CEL","content":"%s"},' \
            $_RECNO $1 $_HALG $_D "$(sed 's/[\\"]/\\&/g' <<< "$REPLY")")"
        echo -e "mr$_RECNO: $_MR" >&2
        _MR=$(echo $_MR$_D | xxd -r -p | ${_HALG}sum | cut -f1 -d' ')
        _RECNO=$((_RECNO+1))
    done

    echo "{\"mr\":\"$_MR\",\"log\":[${_J%,}]}"
}

# AA stores measurements in RTMR[2]
NELR_to_CELR_json 2

If the previous log file is saved as mr.log, and the script file is named n2celr.sh, then executing the following command

./n2celr.sh < mr.log | jq .

will yield the following result.

{
  "mr": "4693cf54a9c8b386c67290abfa981696fe07d985d06e7f06c6929370db90bbb647faff2bc0375d270aa2f0dd42635568",
  "log": [
    {
      "recnum": 0,
      "pcr": 2,
      "digests": [
        {
          "hashAlg": "sha384",
          "digest": "5d97ae953f7b03a1e93f58f9d621f84d1bdf053d1ec5ad2461262a938ecd4ed4d42d3c5c42084828da1c8051a6c08e6a"
        }
      ],
      "content_type": "CC-CEL",
      "content": "INIT sha384/000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    },
    {
      "recnum": 1,
      "pcr": 2,
      "digests": [
        {
          "hashAlg": "sha384",
          "digest": "98cfd27eaba5b57ee1b9bbb2da9886ec9ebc40ef428ffd5eda1a4024039beed7d0f4b46b2e63bb419780f619add308ef"
        }
      ],
      "content_type": "CC-CEL",
      "content": "github.com/confidential-containers ImageLoad sha256/0123456789abcdef1123456789abcdef2123456789abcdef3123456789abcdef"
    },
    {
      "recnum": 2,
      "pcr": 2,
      "digests": [
        {
          "hashAlg": "sha384",
          "digest": "dfcf2f7ac9a8e19cbe4233837a1cd34a4c6517a8d6852aa8a566e1f43b39dc1f32bcb51fa4e6c7a139c844ab5c766361"
        }
      ],
      "content_type": "CC-CEL",
      "content": "github.com/confidential-containers EventWithJSONParams { \"key1\":\"value1\\tmore values and 'quotes'\\n\", \"key2\": [ \"value2\", 2, true, null ] }"
    }
  ]
}

Please note in the JSON output provided above, the key mr contains the ultimate value of the MR, while the key log represents the JSON array of CELRs. Therefore,

./n2celr.sh < mr.log | jq -r .mr

will yield the final value of the MR; and

./n2celr.sh < mr.log | jq .log

will yield the CEL log as an JSON array; while

n2celr.sh < mr.log | jq -r .log[].content

will convert CELRs back to NELRs (with comments stripped).

Xynnn007 commented 8 months ago

Hi @binxing Do you have any plan to share this in the weekly community meeting? I think @sameo @fitzthum can help.

cc @jiazhang0 @arronwy @jialez0

binxing commented 8 months ago

@Xynnn007, unfortunately I'll be OOO starting Tuesday for nearly the whole March. But what I put down here is pretty much what we have discussed offline. So please lead the discussion if you feel the urgency. 😄

@sameo and @dcmiddle and @mythi, I think this log format has the potential to be adopted by the Linux kernel community. Thoughts?

jiazhang0 commented 8 months ago

At this moment, it is not determined yet where the producer of eventlog is. As @binxing mentioned, kernel is the best place to implement NELR (CELR may be too complex for kernel). If so, the translation from NERL to CELR is necessary. If kernel doesn't support NELR, we can choose to implement CELR directly in AA.

dcmiddle commented 8 months ago

Thanks for this detailed write-up @binxing ! I believe this format

satisfies the kernel's interest in having a standard format
a CoCo (and general developer) interest in having an extensible and human readable format
can be automatically evaluated by a policy-driven verifier/relying party

In regard to policy, I would appreciate @jialez0 's , @danmihai1 's , and @bodzhang 's opinions regarding their policy experience. (in addition to the others who have already commented here)

fitzthum commented 8 months ago

The next few community meetings are booked up anyway, but this sounds like it could be a great topic for when @binxing returns.

binxing commented 8 months ago

The next few community meetings are booked up anyway, but this sounds like it could be a great topic for when @binxing returns.

I think I can do it in the first week of April. But @Xynnn007 and @dcmiddle and some other folks are very familiar with this proposal as well, and may drive a discussion without me if they see fit. 😄

mkulke commented 8 months ago

The systemd community is currently settling on something that's "close to TCG CEL-JSON". I suggest looking at it for prior art.

mkulke commented 8 months ago

AFAIU, in this proposal we would have to set a static register value ("2" in your TDX example) when normalizing a NELR to a CELR, since NELR lacks representation of this value.

This might make sense for TDX'es registers, but can we generalize this for all TEEs? I suspect runtime measurements via TPM (e.g. Coconut SVSM) or else to become more common in a TEE and there are a lot of more registers.

Xynnn007 commented 8 months ago

AFAIU, in this proposal we would have to set a static register value ("2" in your TDX example) when normalizing a NELR to a CELR, since NELR lacks representation of this value.

I think different platform should have a mapping relation ship from standard PCR index to its specific runtime measurement register.

For CoCo, it will set some events over the eventlog of kernel, and we can use a fixed PCR to extend. 2 here I understand is not a PCR index but a TDX RTMR (runtime measurement register). So generally, we could use, for example PCR 19 for all platforms to record CoCo events based on kernel eventlog.

mkulke commented 8 months ago

yes, but if we define the format is it safe to assume that we will always just consume a single "standard" PCR or will this turn out to be restrictive later?

I could imagine e.g. that we want to measure infrastructure components, container images, configuration into different registers.

Xynnn007 commented 8 months ago

I could imagine e.g. that we want to measure infrastructure components, container images, configuration into different registers.

I once thought they can share a same PCR to extend, as high PCRs' usage are still not defined. This IS NOT a mature idea, generally from an easy programmer view of point, s.t. we can have a simple extend_runtime_measurement(event: CoCoEvent) without specifying PCR, where CoCoEvent is a format defined in CoCo and fit in well with the format of eventlog kernel provides.

mkulke commented 8 months ago

maybe the syntax could be extend_runtime_measurement(record: CoCoEventRecord) an event record could still be created with sensible defaults (pcr no, hash algo) but it would remain extensible.

Xynnn007 commented 8 months ago

maybe the syntax could be extend_runtime_measurement(record: CoCoEventRecord) an event record could still be created with sensible defaults (pcr no, hash algo) but it would remain extensible.

Agreed. So either

CoCoEventRecord := KernelEventEntryHeader | CoCoEvent

This means what is given to AA will be directly deliver to kernel. or

CoCoEventRecord := CoCoEvent

This means AA will add an extra header and then deliver to kernel.

The former can set the PCR index via API in KernelEventEntryHeader while the latter cannot.

Did I ignore anything?

mythi commented 8 months ago

The systemd community is currently settling on something that's "close to TCG CEL-JSON". I suggest looking at it for prior art.

I had seen this too. Are you aware of any references consuming this?

mythi commented 8 months ago

This means what is given to AA will be directly deliver to kernel.

@Xynnn007 what do you mean by 'deliver to kernel'? The RFC version of configfs-tsm based RTMR mechanism currently takes a hash as the input only. There's no agreement in place where the logging takes place but my understanding is this proposal suggests AA would maintain the log for CoCo.

mkulke commented 8 months ago

I had seen this too. Are you aware of any references consuming this?

afaik the systemd runtime measurement log aims to keep a degree of consistency with the firmware log, so it's supposed to be consumed with existing tooling and processors (like jq). there was a talk on this, in which Lennart goes into some detail about the purpose:

https://youtu.be/0RSH3JXqShE?list=PLWYdJViL9EioDNHn7xIqQJLyCayNPKeYf&t=1851

curiously the lkml thread circles back to this issue. and likewise, the argument for using a coco-bespoke log format in the lkml thread seems to be based on assumptions:

IMHO, we don't have to follow TCG2 format because TDX is never TPM, nor are any other TEEs that support runtime measurements.

Apart from Azure CVMs that allow runtime measurements via vTPM, SEV-SNP deployments would gain similar capabilities when used with a SVSM.

mkulke commented 8 months ago

Agreed. So either
CoCoEventRecord := KernelEventEntryHeader | CoCoEvent
This means what is given to AA will be directly deliver to kernel. or
CoCoEventRecord := CoCoEvent
This means AA will add an extra header and then deliver to kernel.

The former can set the PCR index via API in KernelEventEntryHeader while the latter cannot.

Did I ignore anything?

Not sure I grok your suggestion fully, what I had in mind was basically:

extend_runtime_measurement(record: CoCoEventRecord) while a CoCoEventRecord struct would still have to be able to keep a register index and a hash algo to keep information parity with the TCG log, possibly with a default constructor.

mythi commented 8 months ago

there was a talk on this, in which Lennart goes into some detail about the purpose:

thanks! A related talk looks interesting too "Unified TPM Event Log for Linux"

Xynnn007 commented 8 months ago

@Xynnn007 what do you mean by 'deliver to kernel'? The RFC version of configfs-tsm based RTMR mechanism currently takes a hash as the input only. There's no agreement in place where the logging takes place but my understanding is this proposal suggests AA would maintain the log for CoCo.

Oh. What I posted is based on an assumption that kernel will keep an eventlog.

Not sure I grok your suggestion fully, what I had in mind was basically:

extend_runtime_measurement(record: CoCoEventRecord) while a CoCoEventRecord struct would still have to be able to keep a register index and a hash algo to keep information parity with the TCG log, possibly with a default constructor.

Right. Basically I fully agree with you. What I mean is that somehow a coding problem which I think we can leave it to concrete PR reviewing work.

wenhuizhang commented 8 months ago

Just one small correction.

systemd measures to PCRs 5 (boot-loader-config), 10 (application), 11 (kernel-boot), 12 (kernel-config), 13 (sysexts), 15 (system-identity).

pcr 10 is for rtmr 2 on TDX.

Also, (1) hope the format could follow the format of CCEL table (TDX eventlog) to get Intel Authority's verify API aligned, (2) also need to align with quoting formats and (3) protocols for remote attestation.

binxing commented 7 months ago

yes, but if we define the format is it safe to assume that we will always just consume a single "standard" PCR or will this turn out to be restrictive later?

My apology for not being clear on this, but the proposed log format is per MR - i.e., one log would be maintained for each MR. That is to avoid the erroneous impression that 2 records extended to different MRs were always extended in the order as appearing in the log. Moreover, various MRs may adopt different log formats, even though that's uncommon.

binxing commented 7 months ago

@mkulke,

curiously the lkml thread circles back to this issue. and likewise, the argument for using a coco-bespoke log format in the lkml thread seems to be based on assumptions:

IMHO, we don't have to follow TCG2 format because TDX is never TPM, nor are any other TEEs that support runtime measurements.

Apart from Azure CVMs that allow runtime measurements via vTPM, SEV-SNP deployments would gain similar capabilities when used with a SVSM.

Not sure I've understood your comment correctly, but I was trying to point out the fact that any TEE would generate a quote in a different format than a TPM quote, hence there must be an indirection - either an attestable vTPM implementation or a set of rules for converting a TEE quote into a TPM quote. In either case the verifier/appraiser must be aware of the TEE specific quote. With that said, we don't have to restrict ourselves to the TCG2 format for compatibility, as that wouldn't make any existing TPM verifiers work on TEE quotes anyway without TEE specific knowledge.

The systemd community is currently settling on something that's "close to TCG CEL-JSON". I suggest looking at it for prior art.

TCG2 format was designed specifically for one application - platform BIOS, hence is difficult to adapt to new applications, as evidenced by systemd. From the document that you referenced, EV_EVENT_TAG has to be used for everything non-BIOS, and that (EV_EVENT_TAG) means arbitrary binary data (i.e., no format) essentially. Therefore, systemd abandons TCG2 log and switches to the "close to TCG CEL-JSON" log (source code) after kernel is up and running. This proposal indeed follows the same approach as systemd except the string value of "content_type", where systemd uses "systemd" while we use "CC-CEL" (but please suggest should you have a better name for "content_type").

binxing commented 7 months ago

maybe the syntax could be extend_runtime_measurement(record: CoCoEventRecord) an event record could still be created with sensible defaults (pcr no, hash algo) but it would remain extensible.

Agreed. So either
CoCoEventRecord := KernelEventEntryHeader | CoCoEvent
This means what is given to AA will be directly deliver to kernel. or
CoCoEventRecord := CoCoEvent
This means AA will add an extra header and then deliver to kernel.

The former can set the PCR index via API in KernelEventEntryHeader while the latter cannot.

With the first column serving as the application/domain identifier, the proposed log format should be able to accommodate arbitrary applications simultaneously and could be used by the kernel directly - i.e., we probably don't need KernelEventEntryHeader except the "PCR index". Given the TSM patch organizes each MR in its own directory, I guess it might be easier to have per-MR logs, in which case "PCR index" would be implied by the path to the MR directory.

@sameo, thoughts?

mythi commented 7 months ago

yes, but if we define the format is it safe to assume that we will always just consume a single "standard" PCR or will this turn out to be restrictive later?

My apology for not being clear on this, but the proposed log format is per MR - i.e., one log would be maintained for each MR. That is to avoid the erroneous impression that 2 records extended to different MRs were always extended in the order as appearing in the log. Moreover, various MRs may adopt different log formats, even though that's uncommon.

I think you meant to say the proposed NELR format is per MR. In case of TDX, for example, we'd be using that for MR 2 but MRs 0/1 would be reserved for BIOS/boot and will be using the TCG2 log format (as defined by CCEL). The final CELR would have both and that would be included as part of the TEE evidence.

mythi commented 7 months ago

Given the TSM patch organizes each MR in its own directory, I guess it might be easier to have per-MR logs, in which case "PCR index" would be implied by the path to the MR directory.

a bit offtopic, but if a TEE vendor decides to enable an MR to userspace through configfs-tsm, is it still something the kernel could continue to write to ("ownership wise" basically).

binxing commented 7 months ago

Given the TSM patch organizes each MR in its own directory, I guess it might be easier to have per-MR logs, in which case "PCR index" would be implied by the path to the MR directory.

a bit offtopic, but if a TEE vendor decides to enable an MR to userspace through configfs-tsm, is it still something the kernel could continue to write to ("ownership wise" basically).

According to my limited understanding, files in configfs can also be opened and written to in kernel mode just like any other files. But I guess a more common practice is for configfs-tsm to expose a dedicated API to mirror the action that would be triggered by a user mode write. E.g., if the user mode interface is a write-only configfs file to which each line written would cause the line be hashed and extended to the associated MR, then a kernel API, say hash_extend_and_append_log(int mr_index, const char *line), can be exposed to allow other kernel modules to log/extend the specified MR as if the line were written to the corresponding write-only configfs file from user mode. Again, my understanding on configfs is limited so my comments could be very wrong. Guess @sameo and/or other kernel experts can comment.

Xynnn007 commented 7 months ago

Overall, I think a JSON type of eventlog would be good for user reading. One thing still not clear to me:

Is the API opened by Attestation Agent (event, pcr_id), and the logic inside AA will translate pcr_id to mr_id due to platform?

binxing commented 7 months ago

Overall, I think a JSON type of eventlog would be good for user reading. One thing still not clear to me:

I think CEL-JSON would be easier to work on with existing CEL tools, while both CEL and the proposed NELR format should be easy to understand by human users. However, my tendency is still to use this proposal as the native format and convert it to CEL-JSON on demand, because it clearly separates measured data (i.e., NELR lines) vs. unmeasured data (i.e., comments). Otherwise, unmeasured info would have to be mixed into measured info. E.g., systemd has to enclose both unmeasured ("bootId" and "timestamp") and measured ("description" and "eventType") info into "content", resulting in confusions on which part of "content" to hash when verifying the log.

Is the API opened by Attestation Agent (event, pcr_id), and the logic inside AA will translate pcr_id to mr_id due to platform?

For AA, assuming this API is consumed by CoCo components only, I wouldn't let callers specify pcr_id because I don't think we need more than one MR. I would define the API to take 2 parameters - (event, comment), both are optional - i.e., if event is omitted, AA only appends to comments, and vice versa. Internally, AA decides which MR to extend (only if event is non-null) depending on the hardware architecture.

Xynnn007 commented 7 months ago

I think CEL-JSON would be easier to work on with existing CEL tools, while both CEL and the proposed NELR format should be easy to understand by human users. However, my tendency is still to use this proposal as the native format and convert it to CEL-JSON on demand, because it clearly separates measured data (i.e., NELR lines) vs. unmeasured data (i.e., comments). Otherwise, unmeasured info would have to be mixed into measured info. E.g., systemd has to enclose both unmeasured ("bootId" and "timestamp") and measured ("description" and "eventType") info into "content", resulting in confusions on which part of "content" to hash when verifying the log.

Agreed.

For AA, assuming this API is consumed by CoCo components only, I wouldn't let callers specify pcr_id because I don't think we need more than one MR. I would define the API to take 2 parameters - (event, comment), both are optional - i.e., if event is omitted, AA only appends to comments, and vice versa. Internally, AA decides which MR to extend (only if event is non-null) depending on the hardware architecture.

Inspired by @mkulke , I prefer to have an optional parameter to specify the target PCR. If it is not given, a default one will be used. BTW, AA is not intented to only serve CoCo, potentially it could work for more confidential computing scenarios where more PCRs would be used.

Apart from the design of AA, some other problems not clear to me and they are about the future

What impact does the kernel community have on this? @mythi said that the current RFC only extends RTMR via configfs-tsm. Do they/you have further plans to let the kernel to maintain eventlog? Kernel seems to be a better place to keep eventlogs. Otherwise any process with root privileges can maintain a log and call the rtmr extension interface, which will make it very difficult to match the log content and the rtmr hash, because the logs are distributed in multiple places and the order will affect.
If kernel community finally decides to support maintain a log, we should deprecate AA's log and embrace kernel's log. At that time, AA would just receive CoCo events and call kernel API to record them. But how do we ensure the log format of AA align with kernel's?

Overall, I think the NELR format proposed in this issue is enough to cover the current problems we are facing.

binxing commented 7 months ago

Inspired by @mkulke , I prefer to have an optional parameter to specify the target PCR. If it is not given, a default one will be used. BTW, AA is not intented to only serve CoCo, potentially it could work for more confidential computing scenarios where more PCRs would be used.

Agreed.

What impact does the kernel community have on this? @mythi said that the current RFC only extends RTMR via configfs-tsm. Do they/you have further plans to let the kernel to maintain eventlog? Kernel seems to be a better place to keep eventlogs. Otherwise any process with root privileges can maintain a log and call the rtmr extension interface, which will make it very difficult to match the log content and the rtmr hash, because the logs are distributed in multiple places and the order will affect.

I can't agree more with you that the kernel is in the best position to maintain the logs and truly hope the kernel can take this role. But it's uncertain whether or not the kernel will do it and when. So I brought the discussion here, simply to involve more CC experts. I hope everyone having an opinion on this topic can opine on the kernel thread to move it forward.

In case the kernel doesn't maintain the logs, multiple applications can still share accesses to the logs and use flock() to synchronize. But all applications must agree on a log format. AA will be the first among those applications to set the de facto standard.

Regarding the plan, I think it is a question for @sameo as he owns the current RFC patch.

If kernel community finally decides to support maintain a log, we should deprecate AA's log and embrace kernel's log. At that time, AA would just receive CoCo events and call kernel API to record them. But how do we ensure the log format of AA align with kernel's?

Guess we cannot ensure the kernel will adopt the same format. But we can influence the kernel community. So please participate in the kernel discussion if you have an opinion. 😄

mythi commented 7 months ago

What impact does the kernel community have on this? @mythi said that the current RFC only extends RTMR via configfs-tsm. Do they/you have further plans to let the kernel to maintain eventlog? Kernel seems to be a better place to keep eventlogs. Otherwise any process with root privileges can maintain a log and call the rtmr extension interface, which will make it very difficult to match the log content and the rtmr hash, because the logs are distributed in multiple places and the order will affect.

I can't agree more with you that the kernel is in the best position to maintain the logs and truly hope the kernel can take this role. But it's uncertain whether or not the kernel will do it and when. So I brought the discussion here, simply to involve more CC experts. I hope everyone having an opinion on this topic can opine on the kernel thread to move it forward.

The early review feedback of the RTMR RFC patchset also suggests that the ABI would take the log entry and maintain the in-kernel log. That is also listed as a "TODO" item in v2 so I guess the question is more about when. I believe in this conversation we also though it would be good to be a per-MR log? I wonder if it would be easy to experiment with the proposed log format (I have the code for TDX plumbing already...)

However, at the same time TPM suffers from the same synchronization issues and there's no in-kernel runtime log for it. The systemd case is such that it clearly defines the non-overlapping PCRs it "owns" and keeps a log them. But if AA also needs to support TPM based measurements it'll need a log anyways?

wenhuizhang commented 7 months ago

It seems like this tsm patch focuses on interaction with both TPM and RTMR firmware, which bypasses IMA, and expose the report directly to the userspace, https://lore.kernel.org/lkml/20240114223532.290550-2-sameo@rivosinc.com/#Z31drivers:virt:coco:tsm.c , however for interaction with RTMR, we could directly achieve this by enabling RTMR mapping to a vTPM driver. IMHO, both (1) mapping RTMR to vTPM driver in IMA backend and (2) through tsm exposing TPM/RTMR to userspace are needed. Please correct me if I made something wrong.

dcmiddle commented 7 months ago

This issue proposes a format and asks whether that works for CoCo. There's some other topics intermixed in the discussion, but as I re-read the thread (and watched the April 4 community meeting) I don't see unresolved issues about the format. I'd venture that we have consent that NELR satisfies CoCo needs.

What we would like to do is take this agreement, if indeed we have reached it, back to LKML and assert that NELR is a good choice for advancing the RTMR patch set.

In parallel we can independently decide whether AA waits on the upstream kernel ABI or does a separate implementation of this logging feature until an upstream ABI gets merged.

Xynnn007 commented 7 months ago

What we would like to do is take this agreement, if indeed we have reached it, back to LKML and assert that NELR is a good choice for advancing the RTMR patch set.

In parallel we can independently decide whether AA waits on the upstream kernel ABI or does a separate implementation of this logging feature until an upstream ABI gets merged.

Sounds good. I agree with implementing a separate one in AA. wdyt? @sameo @fitzthum @jiazhang0 @jialez0 @mkulke

l1k commented 7 months ago

As discussed at the April 12 meeting, PCI device authentication has the need to expose a log of CMA-SPDM signatures to user space. It is currently using a custom log format, see l1k/linux@ca420b22af053b26eecf54aedf5da84eb9569c0c. That might be a potential use case for the log format discussed in the present issue. Just wanted to make sure it's captured in the conversation here as well.

mythi commented 7 months ago

What we would like to do is take this agreement, if indeed we have reached it, back to LKML and assert that NELR is a good choice for advancing the RTMR patch set. In parallel we can independently decide whether AA waits on the upstream kernel ABI or does a separate implementation of this logging feature until an upstream ABI gets merged.

Sounds good. I agree with implementing a separate one in AA. wdyt? @sameo @fitzthum @jiazhang0 @jialez0 @mkulke

Independently, CoCo-AS and attesters can work to enable the CEL log conversion and replay. E.g., the TDX CCEL can be converted into this.

confidential-containers / guest-components

AA: Measurement Event Log Format #495

NELR (Native Event Log Record) Format

Converting NELR to CELR (Canonical Event Log Record)