[Transient Execution] Non-predictor-based CWEs

scottconstable commented 1 year ago

w.r.t. https://github.com/CWE-CAPEC/hw-cwe-sig/pull/5

The current proposal has two CWEs that can describe transient execution vulnerabilities that do not arise from a microarchitectural predictor:

CWE-A: A processor event may allow subsequent operations to execute transiently (the operations execute without committing to architectural state).

and

CWE-B: A processor event (for example, a fault or microcode assist) may allow incorrect data to be forwarded from the operation that triggered the event to operations that execute transiently.

(Note that CWE-A is intended as a catch-all for transient execution vulnerabilities that do not fit into CWE-B)

Option 1: (Current Proposal) CWE-A is a catch-all and CWE-B is a subset for forwarding after fault/assist/abort

Pros:

Most existing transient execution CVEs would fall under CWE-B, and CWE-B's description precisely characterizes this plethora of vulnerabilities. For example, Meltdown is elegantly summarized using CWE-B: "A fault may allow data in the L1D to be forwarded from the operation that triggered the fault to operations that execute transiently."

Option 2: Merge CWE-A and CWE-B into a single non-predictor-based transient execution CWE

I think that the best way to do this would be to keep CWE-A and discard CWE-B. This would simplify the proposal, at the cost of losing the precision of CWE-B.

(This option retains CWE-C and CWE-D from the original proposal)

Option 3: Reframe CWE-A and CWE-B to delineate between cross-domain and same-domain exposure

The reframing might look something like this:

CWE-A: A processor event (for example, a fault or microcode assist) may expose data across a domain boundary during transient execution of subsequent operations.

CWE-B: A processor event may allow incorrect operations (or correct operations with incorrect data) to execute transiently, exposing data within a domain boundary.

(This option retains CWE-C and CWE-D from the original proposal)

Option 4: Domain-oriented CWEs

Three CWEs that delineate between the following cases: anything that can cause incorrect code/data to be used during transient execution within the same domain (e.g., Spectre v1, non-canonical, FPVI), predictor state shared across domains (e.g., BHI), program data in shared uarch structures that can be exposed across domains (e.g., Meltdown).

CWE-A: A processor event may allow incorrect operations (or correct operations with incorrect data) to execute transiently.

CWE-B: Shared microarchitectural predictor state may allow code in one hardware domain to influence predictions in another domain. This may cause incorrect operations (or correct operations with incorrect data) to execute transiently in the second domain.

CWE-C: A processor event may allow architecturally inaccessible data to be used by operations that execute transiently.

In some ways, I think that this option represents a "best-of" take on Options 1-3. The only downside, I think, is that it is a bit less focused on the root cause. For example, LVI and MDS have the same root cause (fault/assist forwards data from a fill/store buffer), but MDS would clearly fall under (Option 4).CWE-C and LVI would have to be in (Option 4).CWE-A. In Option 1, LVI and MDS would both fit unambiguously into (Option 1).CWE-B.

pbiyer commented 1 year ago

@scottconstable In case of Option 3 can we not use CWE-B for LVI and CWE-A for MDS. Also in Option 3 there is no option to describe BTI type vulnerabilities.

scottconstable commented 1 year ago

@scottconstable In case of Option 3 can we not use CWE-B for LVI and CWE-A for MDS.

Yes we can.

Also in Option 3 there is no option to describe BTI type vulnerabilities.

I should have stated explicitly that CWE-C and CWE-D would remain unchanged in Options 1 and 2. I have fixed this in the problem statement.

g-kini commented 1 year ago

Feedback from David:

I think currently yes, that CVE-2020-12965 might fall under CWE-B but I think the distinction between same-domain and cross-domain is important. Same-domain leakage is only relevant for code that attempts to maintain isolation between different software-managed domains, as is the case in sandboxing, OS kernels, etc. Also there are generic mitigations strategies that can mitigate same-domain leakage. In particular, same-domain leakage can often be fixed by restricting the address space (e.g. Address Space Isolation in Linux or Google’s Site Isolation for Chrome). While cross-domain leakage cannot be mitigated with those techniques…that’s a key distinction in my mind.

scottconstable commented 1 year ago

Thanks David and @g-kini! I made some revisions to Option 4 and even did a mock-up of the revised CVEs. I also added the CVE that David cited to the bottom of the list. I notice that CVE-2020-12965 currently uses CWE-74, but the CVE description language does not resemble the CWE description language at all.

CVE	Current Description	New Description
CVE-2017-5715 (Rogue Data Cache Load, RDCL, Meltdown, Variant 3)	[No CWE] Systems with microprocessors utilizing speculative execution and indirect branch prediction may allow unauthorized disclosure of information to an attacker with local user access via a side-channel analysis of the data cache.	[CWE-C] A fault may allow architecturally inaccessible data in the L1D to be used by operations that execute transiently.
CVE-2017-5753 (Bounds Check Bypass, BCB, Spectre v1)	[No CWE] Systems with microprocessors utilizing speculative execution and indirect branch prediction may allow unauthorized disclosure of information to an attacker with local user access via a side-channel analysis of the data cache.	[CWE-D] A microarchitectural conditional branch misprediction may allow incorrect operations to execute transiently.
CVE-2017-5754 (Branch Target Injection, BTI, Spectre v2)	[No CWE] Systems with microprocessors utilizing speculative execution and indirect branch prediction may allow unauthorized disclosure of information to an attacker with local user access via a side-channel analysis of the data cache.	[CWE-C] Shared indirect branch predictor state may allow code in one hardware domain to influence indirect branch predictions in another domain. This may cause incorrect operations to execute transiently in the second domain.
CVE-2018-3639 (Speculative Store Bypass, SSB, Spectre v4)	[No CWE] Systems with microprocessors utilizing speculative execution and speculative execution of memory reads before the addresses of all prior memory writes are known may allow unauthorized disclosure of information to an attacker with local user access via a side-channel analysis.	[CWE-D] A microarchitectural memory disambiguation misprediction may allow operations to execute transiently with incorrect data.
CVE-2018-3640 (Rogue System Register Read, RSRE, Spectre v3a)	[No CWE] Systems with microprocessors utilizing speculative execution and that perform speculative reads of system registers may allow unauthorized disclosure of system parameters to an attacker with local user access via a side-channel analysis.	[CWE-B] A fault may allow architecturally inaccessible system register data to be to be used by operations that execute transiently.
CVE-2018-3615 (L1 Terminal Fault, L1TF – SGX, Foreshadow)	[No CWE] Systems with microprocessors utilizing speculative execution and Intel® software guard extensions (Intel® SGX) may allow unauthorized disclosure of information residing in the L1 data cache from an enclave to an attacker with local user access via a side-channel analysis.	[CWE-B] A fault may allow architecturally inaccessible SGX enclave data in the L1D to be used by operations that execute transiently.
CVE-2018-3620 (L1 Terminal Fault, L1TF – OS/SMM)	[No CWE] Systems with microprocessors utilizing speculative execution and address translations may allow unauthorized disclosure of information residing in the L1 data cache to an attacker with local user access via a terminal page fault and a side-channel analysis.	[CWE-B] A fault may allow architecturally inaccessible OS/SMM data in the L1D to be used by operations that execute transiently.
CVE-2018-3646 (L1 Terminal Fault, L1TF – VMM)	[No CWE] Systems with microprocessors utilizing speculative execution and address translations may allow unauthorized disclosure of information residing in the L1 data cache to an attacker with local user access with guest OS privilege via a terminal page fault and a side-channel analysis.	[CWE-B] A fault may allow architecturally inaccessible VMM data in the L1D to be used by operations that execute transiently.
CVE-2018-12126 (Microarchitectural Store Buffer Data Sampling, MSBDS, Fallout)	[No CWE] Store buffers on some microprocessors utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A fault or microcode assist may allow architecturally inaccessible data in a microarchitectural store buffer to be used by operations that execute transiently.
CVE-2018-12127 (Microarchitectural Load Port Data Sampling, MLPDS, RIDL)	[No CWE] Load ports on some microprocessors utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A fault or microcode assist may allow architecturally inaccessible data in a microarchitectural load port to be used by operations that execute transiently.
CVE-2018-12130 (Microarchitectural Fill Buffer Data Sampling, MFBDS, ZombieLoad)	[No CWE] Fill buffers on some microprocessors utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A fault or microcode assist may allow architecturally inaccessible data in a microarchitectural fill buffer to be used by operations that execute transiently.
CVE-2019-11091 (Microarchitectural Data Sampling from Uncacheable Memory, MDSUM)	[No CWE] Uncacheable memory on some microprocessors utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A fault or microcode assist triggered when accessing uncacheable memory may allow architecturally inaccessible data to be used by operations that execute transiently.
CVE-2019-1135 (TSX Asynchronous Abort, TAA)	[No CWE] TSX Asynchronous Abort condition on some CPUs utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A TSX Asynchronous Abort in some Intel® processors may allow architecturally inaccessible data to be used by operations that execute transiently.
CVE-2020-0543 (Special Register Buffer Data Sampling, SRBDS, Crosstalk)	[No CWE] Incomplete cleanup from specific special register read operations in some Intel(R) Processors may allow an authenticated user to potentially enable information disclosure via local access.	[CWE-B] A fault, microcode assist, or abort may allow architecturally inaccessible special register data to be used by operations that execute transiently.
CVE-2020-0548 (Vector Register Sampling)	[No CWE] Cleanup errors in some Intel® Processors may allow an authenticated user to potentially enable information disclosure via local access.	[CWE-B] A fault, microcode assist, or abort may allow architecturally inaccessible data in a microarchitectural vector register to be used by operations that execute transiently.
CVE-2020-0549 (L1D Eviction Sampling)	[No CWE] Cleanup errors in some data cache evictions for some Intel® Processors may allow an authenticated user to potentially enable information disclosure via local access.	[CWE-B] A fault, microcode assist, or abort may allow architecturally inaccessible data in the L1D to be used by operations that execute transiently.
CVE-2020-0550 (Snoop-assisted L1D)	[No CWE] Improper data forwarding in some data cache for some Intel® Processors may allow an authenticated user to potentially enable information disclosure via local access.	[CWE-B] A fault, microcode assist, or abort may allow architecturally inaccessible data in the L1D to be used by operations that execute transiently.
CVE-2020-0551 (Load Value Injection, LVI)	[No CWE] Load value injection in some Intel(R) Processors utilizing speculative execution may allow an authenticated user to potentially enable information disclosure via a side channel with local access.	[CWE-B] A fault, microcode assist, or abort may allow operations to execute transiently with incorrect data.
CVE-2021-0086 (Floating-Point Value Injection, FPVI)	[CWE-204] Observable response discrepancy in floating-point operations for some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-B] A floating-point microcode assist may allow operations to execute transiently with incorrect data.
CVE-2021-0089 (Speculative Code Store Bypass, SCSB)	[CWE-204] Observable response discrepancy in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-A] A machine clear triggered by self-modifying code may allow incorrect operations to execute transiently.
CVE-2021-33149 (Speculative Load Disordering, SLD, Speculative Cross-Store Bypass)	[CWE-205] Observable behavioral discrepancy in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-A] A machine clear triggered by a memory ordering violation may allow operations to execute transiently with incorrect data.
CVE-2022-0001 (Branch History Injection, BHI, Spectre-BHB)	[CWE-1303] Non-transparent sharing of branch predictor selectors between contexts in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-C] Shared branch history state may allow user/guest code to influence indirect branch predictions for kernel/VMX-root code. This may cause incorrect operations to execute transiently in the kernel/VMX-root code.
CVE-2022-0002 (Intra-mode Branch Target Injection, IMBTI)	[CWE-1303] Non-transparent sharing of branch predictor within a context in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-D] A microarchitectural indirect branch misprediction may allow incorrect operations to execute transiently.
CVE-2022-29901 (RSB underflow, Retbleed)	[CWE-1303] Non-transparent sharing of branch predictor targets between contexts in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-C for processors w/o eIBRS] Shared return stack buffer state may allow user/guest code to influence indirect branch predictions for kernel/VMX-root code. This may cause incorrect operations to execute transiently in the kernel/VMX-root code.; [CWE-D for processors w/ eIBRS] RSB alternate behavior may allow incorrect operations to execute transiently.
CVE-2022-26373 (Post-barrier RSB)	[CWE-1303] Non-transparent sharing of return predictor targets between contexts in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.	[CWE-C] Shared return stack buffer state may allow code that executes before a prediction barrier to influence indirect branch predictions after the barrier. This may cause incorrect operations to execute transiently after the prediction barrier.
CVE-2020-12965 (non-canonical loads)	[CWE-74] When combined with specific software sequences, AMD CPUs may transiently execute non-canonical loads and store using only the lower 48 address bits potentially resulting in data leakage.	[CWE-C] A non-canonical load or store may allow dependent operations to execute transiently with incorrect data.

jasonkoberg commented 1 year ago

@scottconstable thanks for putting this together and for making these suggested revisions based on the feedback. Here are some thoughts from me:

Option 1: Now that I understand the thought process better, this seems like a good option. Having a "catch-all" via CWE-A I think is really useful
Option 2: I think I had originally proposed something like this but after reviewing the other options I think losing the precision of CWE-B is not the right approach. The specificity of a fault/microcode assist causing data forwarding I think is an important distinction that needs its own CWE and those specifics would get lost by combining it with CWE-A.
Option 3: Out of the options, I think this is the best. It refines the existing CWE-A and CWE-B to be more specific. As a user of CWE, I could clearly determine if the issue was predictor based or not (CWE-C/D or CWE-A/B respectively). If not, I would then determine if the weakness was cross domain or not (CWE-A or CWE-B respectively). That "binary search" would allow a user to easily land on CWE-A or CWE-B without any ambiguity. I also think this would maintain good coverage without requiring a separate "catch-all"
Option 4: I think the elevating the messaging away from the root cause makes this option much more confusing.

With that said, my favorite is Option 3 with my second favorite being Option 1 (the original).

scottconstable commented 1 year ago

@jasonkoberg I agree with your assessment that Option 4 is "elevating the messaging away from the root cause." However, I also believe that Option 3 can be characterized similarly. The distinction between "cross-domain" and "same-domain" exposure is a distinction between exploits, not root causes. For example, Microarchitectural Data Sampling (MDS) and Load Value Injection (LVI) are cross-domain and same-domain exploits (respectively) of the same root cause. According to Option 3, MDS would be an instance of CWE-A and LVI would be an instance of CWE-B.

Options 1 and 2 are root-cause oriented. According to Option 1, MDS and LVI would both be instances of CWE-B. According to Option 2, both would be instances of CWE-A.

jasonkoberg commented 1 year ago

@scottconstable thanks for the feedback. I don't disagree with you. I was really taking a step back from a user perspective and trying to think through how I would select a CWE from the various options. I'm assuming most users of CWE will not have the level of expertise you, David, and others involved in this classification exercise have. I'm trying to think through how we can help users ensure that they select the "right" CWE without defaulting to one that is the most general. With that said, it seems to me that Option 3 has the least amount of ambiguity between the 4 CWEs (they are all clearly distinct) so a user of CWE is most likely to pick the "right" CWE for their use case. The other options have CWEs that have more of an overlap and users may end up selecting the wrong one by mistake or because they do not understand the details. Or they may opt to just always select the most general one. My preference for Option 3 is that I think it can help prevent users from doing this.

scottconstable commented 1 year ago

Thanks Jason! The "user perspective" is one that I admit I am not in the best position to judge. I think we can discard Option 2 and focus on Options 1, 3, and 4. That said, I also just noticed that Option3.CWE-B is worded very similarly to Option3.CWE-D: Option3.CWE-B: A processor event may allow incorrect operations (or correct operations with incorrect data) to execute transiently, exposing data within a domain boundary. Option3.CWE-D: Microarchitectural predictors may allow incorrect operations (or correct operations with incorrect data) to execute transiently after a misprediction. In a sense, the latter is a refinement of the former.

jasonkoberg commented 1 year ago

@scottconstable I interpreted Option3.CWE-B as non-predictor based events (like a fault) and Option3.CWE-D predictor based (like branch misprediction). I think the was the original intent so maybe we can adjust the language to make that more clear?

To me, I interpreted Option 3 broken down as follows (in simple terms):

CWE-A: Non-predictor based event causes data forwarding across a domain boundary
CWE-B: Non-predictor based event causes data forwarding within the same domain boundary
CWE-C: A misprediction caused by cross domain sharing of a predictor causes data forwarding
CWE-D: A misprediction with a predictor in the same domain causes data forwarding

scottconstable commented 1 year ago

@jasonkoberg Your intuition matches my intent! As a general principle, I think that it preferable to define a new concept according to what it is and not according to what it isn't. So I am reluctant to use phrasing like "non-predictor based event," even though that is what is intended. Having Option3.CWE-D as a refinement of Option3.CWE-B avoids this problem, though the "user perspective" likely suffers.

(I would like to hear others' thoughts as well!)

g-kini commented 1 year ago

For some reason Option 3 was the most understandable for a novice like me to understand at the beginning when you had posted it @scottconstable. Not sure if it helps or not. Regarding having CWEs as refinements, CWE can certainly support that with its hierarchy and types of weaknesses. I would not say the user perspective will suffer necessarily if we can organize it appropriately. Would like to hear from others as well on this.

scottconstable commented 1 year ago

The PR (https://github.com/CWE-CAPEC/hw-cwe-sig/pull/5) has been updated to incorporate CWE community feedback, in particular, a complete draft of Option 3 (see above).

Here is the updated formatted proposal: https://github.com/CWE-CAPEC/hw-cwe-sig/blob/0adef88b8e8640562c11fc23b88884d7afaf71cb/working-docs/transient.md.

jasonkoberg commented 1 year ago

@scottconstable, I really like this new structure with the 4 CWEs aligning with the original Option 3 we had discussed. These 4 CWEs seem to capture the intent of our discussions very well.

What is the opinion of others?

BobH-MITRE commented 1 year ago

I spent some time looking at Spectre v1 (BCB), which you mapped to CWE-D.

A title that I am kicking around for this, is "Improper Restriction of Transient Execution for Sensitive Code Paths".

I think this may be the underlying weakness because of references to identifying "speculation barriers" to know where to insert the LFENCE. In addition, SMEP as a mitigation also turns off speculative execution when OS's try to execute application code.

It seemed that under certain circumstances that speculative execution will violate data access policies, so it is a game of when we should turn it off to prevent that.

Thoughts?

CWE-CAPEC / hw-cwe-sig