golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.74k stars 17.5k forks source link

proposal: crypto/subtle: add WithDataIndependentTiming #66450

Open rolandshoemaker opened 5 months ago

rolandshoemaker commented 5 months ago

Apple silicon implements "data memory-dependent prefetchers" (DMPs) which attempt to speed up irregular memory access patterns by doing strange things with pointers in memory. It turns out these optimizations can break cryptography code which is designed to operate in constant time (see https://gofetch.fail), by making certain operations non-constant. It is unclear if Apple intends to do anything about this.

Currently the only reasonable countermeasure to this is setting the ARM DIT MSR bit, which disables the DMP optimizations.

While not affected by this particular attack, Intel has a similar MSR bit, DOIT which similarly disables optimizations.

Given in these cases there are no reasonable blanket mitigations that can be applied to all constant-time code (i.e. blinding is effective for RSA, but there are a number of other implementations affected where there is not a clear solution), and since we are unsure about the performance impact of setting DIT/DOIT, it is unclear what we should do.

It is perhaps worthwhile to provide a opt-in DIT/DOIT mode for Go binaries (i.e. GOEXPERIMENT=dit) which causes them to attempt to set these MSRs where possible. This would provide users the highest level of protection, but with the understanding that they are trading some performance in order to prevent these (typically local only) side channels. It would then be up to users to make the determination if they need to take this extra precaution or not.

Note: the Linux kernel merged a patch which sets DIT by default in kernel space, but does not set it in user space, and user space programs can set it as they see fit (user space programs on darwin can also do this), but they currently have not implemented a DOIT version of this. As far as I can tell user space programs cannot set DOIT currently, at least on the systems that I have access to test on.

Related to #49702.

rolandshoemaker commented 5 months ago

Another, significantly more involved option, would be adding a function closure, say crypto/subtle.WithDIT(func(){}), which would set DIT for enclosed code. We would then be required to enclose all cryptographic code which relies on constant time properties in said closure. This is obviously not ideal, but seems to be the path ARM expects developers to eventually go down, per their own documentation.

FiloSottile commented 5 months ago

Apple's documentation also invites setting and restoring DIT around cryptographic code. I am somewhat skeptical of this, because cryptographic keys, plaintext messages, and other secrets are handled and loaded into memory all over a program, not just in the cryptographic code. The reason we see papers attack cryptographic code is that the rest of the program is application-specific (and that it usually makes simple operations like loading and copying, which we used to understand as safe, but apparently are not anymore).

https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Enable-DIT-for-constant-time-cryptographic-operations

/cc @golang/security

FiloSottile commented 5 months ago

An alternative would be to add a -spectre=dit mitigation. See https://go.dev/wiki/Spectre. It's not quite Spectre, but it's a binary-wide mitigation for a CPU side channel that can cause secret extraction from local attackers.

dottedmag commented 5 months ago

@marcan has found M2 DMP disable chicken bit: https://social.treehouse.systems/@marcan/112238385679496096, so at least whole-system workaround is possible.

rolandshoemaker commented 5 months ago

Note from that thread which I forgot to mention when initially writing this issue: Apple does appear to be enclosing their corecrypto code with calls to enable/disable DIT. It seems like this is, really, the "expected" convention moving forward (although I agree with @FiloSottile, that this seems like a problem that is likely to plague many things, not just the specifically "crypto" parts).

See e.g. https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/osfmk/corecrypto/cc_internal.h#L86.

ericlagergren commented 5 months ago

I'd appreciate if this could be added to subtle so that non-stdlib crypto code could also benefit from it.

ericlagergren commented 5 months ago

Another possible API, similar to runtime.LockOSThread:

// Tries to enable DIT, reporting
// whether it was enabled.
func EnableDIT() bool

// Disables DIT. If DIT is currently
// disabled, it is a no-op.
func DisableDIT()

The compiler could rewrite those as intrinsics.

rolandshoemaker commented 2 months ago

Apple now is also explicitly recommending enclosing cryptographic code that relies on constant time properties with calls to set/unset the DIT bit, see https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Enable-DIT-for-constant-time-cryptographic-operations.

I think we're leaning towards implementing a closure, similar to the one suggested in the secret.Do proposal, which simply sets and unsets the DIT(/DOIT) bit on platforms which support them, and are a no-op otherwise. We'll then enclose all of the standard library code we currently expect to be constant time in these closures, and make it a public API for people doing constant time stuff outside of the standard library.

We'd likely also add a GODEBUG that allows disabling this behavior.

I still need to benchmark how negatively this impacts cryptographic code, but I expect it will cause at least some slowdown (since we'll be inherently disabling some optimization somewhere, otherwise the bit wouldn't need to exist).

prattmic commented 2 months ago

Drive-by comment since I don't see it mentioned above: I assume the DIT state is thread-local (really CPU-local, but presumably migrated by the kernel), thus the Go scheduler will need to save/restore the DIT state on context switch.

(Alternatively, DIT-enabled regions could be non-preemptible. Note that assembly is non-preemptible.)

Edit: Or LockOSThread when enabling DIT.

ericlagergren commented 2 months ago

I still need to benchmark how negatively this impacts cryptographic code, but I expect it will cause at least some slowdown (since we'll be inherently disabling some optimization somewhere, otherwise the bit wouldn't need to exist).

For reference: having implemented this at work for Rust and C, we saw about a 5% performance regression. I can't imagine Go would be much worse.

DemiMarie commented 2 months ago

For what it is worth, Qubes OS always enables DOITM.

rolandshoemaker commented 1 month ago

Looks like we're leaning towards the closure solution, a formal proposal for that:

Akin to the API proposed in #21865 for secret memory tagging, add a function which takes a function which will be run with DIT enabled (if it is available). Nesting usage will leave DIT enabled, tracking when it needs to be unset. Using the function will inherently lock the goroutine from which it is called to the OS thread for the duration of the enclosed function.

Concretely:

func WithDIT(f func())
func DITEnabled() bool // may not be strictly necessary, but for some uses may be valuable

I think there are three open questions:

  1. Should we be referring to this using the term DIT? DIT is ARM specific, and Intel has it's own version, DOIT, which we'll likely want to also implement transparently, so perhaps there could be a better term for this. We could use the phrase "constant-time" (i.e. WithConstantTime) but we need to be extremely clear that this doesn't make explicitly non constant-time code constant (Apple has a big red warning around where they describe how to use DIT explaining that you still need to actually write constant time code for it to be useful).
  2. Where should this functionality live? The proposed runtime/secret package from #21865 could make sense, but the naming of the functions proposed in that package make it seem like the package should really be restricted to just memory tagging. Perhaps runtime/dit (in which case we'd remove the DIT suffix/prefix from the proposed functions)?
  3. With the goal of making it easy to wrap existing crypto code, it might be nice to actually make the signature of the enclosing function func WithDIT(f func() (any, error)) (any, error), allowing the caller to not have to refactor functions to assign results/errors to the parent scope and instead just wrap existing logic (e.g. see how this would work for crypto/rsa here https://go.dev/cl/598337/1) and return results as-is (this of course assumes all existing functions return a (result, error) tuple, which may not be strictly true).

cc @FiloSottile

FiloSottile commented 1 month ago
  1. Should we be referring to this using the term DIT? DIT is ARM specific, and Intel has it's own version, DOIT, which we'll likely want to also implement transparently, so perhaps there could be a better term for this. We could use the phrase "constant-time" (i.e. WithConstantTime) but we need to be extremely clear that this doesn't make explicitly non constant-time code constant (Apple has a big red warning around where they describe how to use DIT explaining that you still need to actually write constant time code for it to be useful).

WithDataIndependence? It's obscure but so is the feature.

  1. With the goal of making it easy to wrap existing crypto code, it might be nice to actually make the signature of the enclosing function func WithDIT(f func() (any, error)) (any, error), allowing the caller to not have to refactor functions to assign results/errors to the parent scope and instead just wrap existing logic (e.g. see how this would work for crypto/rsa here https://go.dev/cl/598337/1) and return results as-is (this of course assumes all existing functions return a (result, error) tuple, which may not be strictly true).

Mild preference against (any, error) unless we can use generics. The type assertion neutralizes the readability advantages.

ericlagergren commented 1 month ago

and Intel has it's own version, DOIT, which we'll likely want to also implement transparently

I don't think you can enable DOIT from userspace. You have to write to an MSR, which is privilege level 0.

ericlagergren commented 1 month ago

With the goal of making it easy to wrap existing crypto code, it might be nice to actually make the signature of the enclosing function func WithDIT(f func() (any, error)) (any, error),

I think this only makes things more difficult. The conversion to any probably allocates, and the result has to be type asserted.

ianlancetaylor commented 1 month ago

runtime.WithDataIndependentTiming(func())

rolandshoemaker commented 1 month ago

Mild preference against (any, error) unless we can use generics. The type assertion neutralizes the readability advantages.

Yeah I was trying to figure out if there was a good way to use generics for this, but 🤷.

I think this only makes things more difficult. The conversion to any probably allocates, and the result has to be type asserted.

Good point.

runtime.WithDataIndependentTiming(func())

Sounds good to me.

rolandshoemaker commented 1 month ago

I don't think you can enable DOIT from userspace. You have to write to an MSR, which is privilege level 0.

I haven't been following it super closely, but at some point there was a Linux kernel discussion about providing APIs to allow setting it from userspace. That might've fizzled out though.

gopherbot commented 1 month ago

Change https://go.dev/cl/598335 mentions this issue: internal/cpu: add DIT detection on arm64

rsc commented 1 month ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 1 month ago

It seems like maybe we should start with this in crypto/internal/something (which wouldn't need a proposal).

runtime.WithDataIndependentTiming seems wrong to me. It's too low-level. What if we need CodeIndependentTiming next, then people have to do

runtime.WithDataIndependentTiming(func() { 
    runtime.WithCodeIndependentTiming(func() {
       ...
    })
})

This also seems like the kind of thing that secret.Do could just do as part of its scope. Given that Keith has a secret.Do CL pending (CL 600635), should we think about rolling it into that?

DemiMarie commented 1 month ago

Could there also be a global function to set DIT/DOIT and never disable it? This would be for programs like OpenBao or age that do nothing but handle secrets.

rolandshoemaker commented 1 month ago

@rsc

It seems like maybe we should start with this in crypto/internal/something (which wouldn't need a proposal).

I'm not strictly opposed, but given this appears to be the expected way to write constant time code in the fast approaching Arm future I expect we will need to provide some way for users to trigger this in non-standard library code. Putting it in internal probably buys us one release cycle, which is maybe all we need to hammer out the right API?

runtime.WithDataIndependentTiming seems wrong to me. It's too low-level. What if we need CodeIndependentTiming next

While it seems hubristic to attempt to divine what processor designers will do next, I would be somewhat surprised if we see another bit that affects constant-time-ness.

We could name the API to make it clear it is just a closure around code that is expected to be constant-time, and then apply the fixes we desire to enforce that property behind the scenes, but I partly like this API because it is somewhat obscure and low-level.

The thing we need to be extremely careful about is making it clear what this does do, and more importantly what it doesn't do. Specifically we need to make it obvious that you still need to write constant-time code in the first place for this to have any affect whatsoever. I worry naming this something more user-friendly, like constanttime.Do, may make people reach for it under the expectation that it will magically make non-constant-time code constant-time. The subtleties here are a pain.

This also seems like the kind of thing that secret.Do could just do as part of its scope. Given that Keith has a secret.Do CL pending (CL 600635), should we think about rolling it into that?

I think, often, these two things will want to be mixed, but there are also cases where they may not need to be. It is not unreasonable to care about zero-ing sensitive memory allocated in a closure, but not necessarily care about constant-time-ness for all operations originating from said closure (due to performance implications), and vice-versa, you may not care about zero-ing all intermediate values, but do care about the constant-time-ness of the operations that produce them.

That said, depending on the benchmarking of both modes, if the overhead is low enough it is maybe reasonable to just lump them together and say you get both even if you only really care about one.

@DemiMarie

Could there also be a global function to set DIT/DOIT and never disable it? This would be for programs like OpenBao or age that do nothing but handle secrets.

This was part of the original proposal, and I'm not strictly opposed. It feels like this is probably something that the OS should really be doing, so that it can properly handle forking etc, but seems reasonable to implement in Go absent that functionality.

rsc commented 1 week ago

What about crypto/subtle.WithDataIndependentTiming?

rolandshoemaker commented 1 week ago

That works for me.

rsc commented 5 days ago

It sounds like the proposal is:

  1. Add crypto/subtle.WithDataIndependentTiming(f func()), which runs f with DIT/DOIT enabled. If the host system does not support a DIT/DOIT mode, f is just run normally; WithDataIndependentTiming doesn't fail/crash or anything like that. At the moment, this means that WithDataIndependentTiming would only do something besides call f on newish arm64, because x86-64 only lets you set the bit in kernel mode. If newer Linux or other operating systems make setting that bit possible from user space then the call would start setting that bit as possible.
  2. Separately, GODEBUG=dataindependenttiming=1 being set at startup (or by a //go:debug line in main) would enable it for the whole program.
  3. There is no way to ask whether DIT/DOIT is enabled in the current goroutine at the moment.

Do I have that right?