golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.12k stars 17.68k forks source link

proposal: math/big: support for constant-time arithmetic #20654

Closed bford closed 2 years ago

bford commented 7 years ago

Problem: Constant-Time Arithmetic for Cryptographic Uses

The math/big package naturally and inevitably gets used for cryptographic purposes, including in the standard Go crypto libraries. However, this usage is currently unsafe because math/big does not support constant-time operation and thus may well be leaking secret keys and other sensitive information via timing channels. This is a well-known problem already documented in math/big's godoc documentation.

A much more specific issue related to this was raised in 2011 (https://github.com/golang/go/issues/2445) but eventually closed for lack of attention for too long.

See the preliminary companion patch 45490 presenting a first-cut at an implementation of this proposal: https://go-review.googlesource.com/c/45490/ But the most important details and considerations are discussed here.

Alternative Approaches to Hardening Go's Cryptographic Code

There are several potential ways to address this weakness in Go's cryptographic packages, such as:

Proposed API Modifications for Constant-Time Support

The essence of this proposal, from an API perspective, is simply to add one new property, represented by a getter-setter pair, to the big.Int type, to enable and configure constant-time operation. The key requirement this API must address is that constant-time arithmetic inherently requires the math library to know the maximum possible/allowed size of the result to be produced in a given context, instead of just using the "smallest" internal buffer that can hold the particular numeric value of the result.

A completely tentative signature for the API addition is follows:

func (x Int) BitCap() int func (x Int) SetBitCap(cap int) *Int

A caller invokes SetBitCap on an Int instance to configure that instance to request constant-time operation for subsequent operations invoked on this instance (x), i.e., with x as the destination. The caller must specify via the 'cap' argument the maximum number of bits (i.e., "bit-capacity") of results that will need to be computed into this Int instance. Once configured in this way, the big.Int implementation's promise to the caller is that for big.Int operations invoked on this instance that support constant-time operation (and not all do or will necessarily be expected to), those operations will be performed such that (a) it will always produce an output result in a buffer sized to hold 'cap' bits regardless of the numeric value of the result, and (b) the arithmetic operation's timing will depend only on the lengths and not the numeric contents of all input operands.

For cryptographic use, it is the responsibility of the caller to ensure that all big.Int instances holding secret keys or other sensitive information, or values computed using such secrets, are configured properly to a fixed size via SetBitCap() before loading sensitive information into them (e.g., via SetBytes or arithmetic operations). When performing an arithmetic operation such as Add, Mul, or Exp on a destination big.Int configured for constant-time operation, the API does not require that all inputs be configured to be the same size, or necessarily to be configured via SetBitCap at all. For example, if crypto code calls z.Add(x,y) in a situation in which operand x is secret/sensitive but operand y is a well-known or otherwise insensitive public value, then operand x should have been likewise configured for constant-time via SetBitCap, but the insensitive operand y need not be. Thus, this API does not create a "separate and incompatible" kind of big.Int for constant-time operation, and allows constant-time and variable-time big.Int instances to be used together as appropriate depending on the sensitivity of particular values in a cryptographic computation. This of course places a burden on the calling crypto code to keep track of which values are sensitive and which aren't, but the developers of crypto code generally face that burden anyway.

If the caller uses x.SetBitCap to configure a particular bit-capacity, but then invokes an operation (e.g., x.Add) that ends up computing a result too large for the configured bit-capacity, then the big.Int arithmetic operation panics if it detects the too-large result. The modified big.Int API does not promise to detect all such too-large results, however, because it internally rounds the requested bit-capacity up to the next machine word. (Note: if it is deemed desirable to enforce bit-capacity exactly, that would probably not be particularly difficult or expensive to do, so it's worth considering as an option. A point for discussion.)

The preliminary, basis-for-discussion patch (https://go-review.googlesource.com/c/45490/) includes a change to the crypto/dsa package illustrating the use of the above API. Note that this is just a temporary "for example" for API discussion purposes. The patch does not even pretend to try to revise all the crypto packages, its example revision of the dsa package itself might still be imperfect/incomplete, and revising each of the crypto packages to support constant-time operations should probably be handled as separate changes once the big.Int API is decided on. The main point is that the code incorporates calls to SetBitCap on values such as 'k' that hold secret keys and sensitive intermediate values, while leaving big.Ints that hold only public values (such as DSA parameters) in the default variable-time configuration. This is intended to give the impression of the extent of changes I expect would be typically required for generic crypto code such as this to support constant-time operation with the proposed API.

Constant-Time Arithmetic Implementation Challenges

Although the API above is hopefully fairly simple and backward-compatible, we of course face several challenges and decisions in delving into the actual implementation details within the math/big package:

Implementation Approach

There are many likely-reasonable approaches to address these challenges, but here are the decisions and major changes reflected in the tentative, for-discussion patch (https://go-review.googlesource.com/c/45490/).

Normalized versus configured-length nat slices

To avoid duplicating much of the current 'nat' code into a separate-but-incompatible internal type for constant-time arithmetic, I chose simply to revise the 'nat' code and the code that uses it to avoid assuming that a 'nat' is always normalized: i.e., to make nat operations be tolerant of non-normalized inputs. In most cases this proves to be not particularly difficult. For example, nat.sub() must be changed no longer to assume that it's always an underflow in the case (m < n) where the first nat is a shorter word-slice than the second, because the second might now be an un-normalized word-slice with a value smaller than the value in the first. The modified 'nat' library still produces normalized values in the case of default, non-constant-time operation, but merely no longer requires or expects this always to be the case.

Since constant-time operations need to produce nat slices of a fixed size potentially larger than their normalized minimum size, we need to parameterize those operations somehow, but as discussed above we cannot easily just add a configuration field to 'nat' because nat isn't a struct and it would be an extremely invasive change to make it one. Thus, the solution I adopted is to add a 'zcap' parameter to 'nat' operations that (now) support constant-time operation. The caller can pass 0 for the zcap parameter to indicate traditional, variable-time operation.

However, since even changing all calls to 'nat' operations (nat.add, nat.mul, etc) throughout the library to add a '0' parameter to all variable-time invocations of these operations would likewise be a fairly invasive change, I (tentatively) chose to minimize this invasiveness, at the expense of a little more code in nat.go, by creating new zcap-parameterized methods named with a 'c' prefix (nat.cadd, nat.csub, etc), and including corresponding backward-compatibility methods with the original unmodified signatures that simply call the new 'c'-prefixed methods with 0 for the zcap parameter. I make no suggestion that this is the ideal solution for the long term; it was intended only to keep the invasiveness of the changes relatively minor for now, and I am completely open in terms of the "right" way to parameterize nat with a configurable bit-capacity.

Constructing and normalizing nat instances

The main, global effect of the new 'zcap' parameter to the constant-time calls is to parameterize nat.cmake and nat.cnorm, the new versions of nat.make and nat.norm to support constant-time operation. In particular, nat.cmake now ensures that the nat byte-slice it allocates is big enough for either the caller's indicated expected maximum result size (n) or for the configured zcap, whichever is larger.

Most importantly, nat.cnorm now normalizes a slice down to the minimum representation size if zcap == 0, but when zcap > 0 it "normalizes" it down to exactly the size corresponding to zcap. Thus, support for constant-time operation effectively just changes the definition of "normalization" slightly: it means "normalize to minimum size" when zcap == 0 and "normalize to exactly zcap size" when zcap > 0. In many parts of the nat code, simply passing along the zcap parameter down to cmake before an operation and cnorm at the end is "mostly" enough to address the buffer-management challenges that constant-time operation presents.

Conditionally preserving variable-time optimizations

Throughout the implementations of various arithmetic operations mostly in nat.go, there are currently optimizations here that are inherently non-constant-time. For example, basicMul avoids calling addMulVVW at all for multiplicand words that happen to be zero. Similarly, mul normalizes intermediate results in several places during its calculations. To avoid hurting the performance of big-integer arithmetic for general non-cryptographic purposes, I didn't want to remove these optimizations entirely, so instead I made those optimizations conditional on a test for 'zcap > 0'. Thus, the modified big.Int code should not perform noticeably worse under default variable-time operation at least due to lost optimization opportunities.

Note that for strict enforcement of constant-time operation, these tests sometimes assume that the Go compiler preserves natural left-to-right evaluation order for the && and || operators, and I'm not familiar enough with the Go compiler to know yet whether that may be safely relied on. In the example of basicMul, we call addMulVVW 'if zcap > 0 || d != 0', which will be properly constant-time if the Go compiler either (a) evaluates this expression to a single 'bool' value and then performs only a single conditional branch on the result of the full expression, or (b) uses two conditional branch, first branching on the result of 'zcap > 0' and then branching on the result of 'd != 0' only if zcap == 0. If the Go compiler were to get overly smart and decide it wants to evaluate and branch on 'd != 0' first, however, then it would technically break constant-time operation (though in this case this would probably leave an extremely small and hard-to-exploit timing leak). However, this general risk of an overly-smart compiler interfering with developers' attempts to express constant-time code is common and not remotely unique to Go.

Conditional calculations in constant time

Besides disabling variable-time optimizations when zcap > 0, a standard challenge in implementing constant-time arithmetic is dealing with conditionals that depend on computed data. For example, in each iteration nat.montgomery must test a data-dependent carry flag and perform a subtraction or not depending on its value. To address these situations, in the case of constant-time operation (zcap > 0) the code adopts the standard solution of computing both possible result values and performing a constant-time conditional copy or "select" of one or the other. The new nat.sel method provides such a constant-time select for internal use. Since computing two results and selecting just one of them can negatively impact performance, the code does this only in the zcap > 0 case, and retains the existing conditional behavior when zcap == 0.

Effects on Architecture-specific Arithmetic Code

In most cases, it appears that supporting constant-time operation in the math/big package should require no modifications to the optimized arithmetic code in architecture-specific assembly, because that code tends to be already designed to use architecture-specific mechanisms for handling carries and such without requiring conditionals.

Some of the generic versions of the arithmetic primitives - e.g., addWW_g - had variable-time implementations due to conditional carry handling, but I fixed those to use the same constant-time carry/overflow handling logic that the corresponding generic vector primitives (e.g., addVV) were already using.

I encountered only one "missing" arithmetic primitive needed for constant-time arithmetic, namely constant-time comparison. For this purpose I added a 'cmpVV_g', which essentially performs the same operation as a 'subVV_g' except (a) it discards the subtraction's normal numeric output, and hence does not need a target vector; and (b) in addition to computing the carry from the subtraction, it also produces an indicator of whether the result is equal to zero or not.

The current patch only uses this generic subVV_g implementation and does not add a corresponding architecture-specific subVV implementation for any architecture. Doing so should be quite easy and should be a relatively straightforward variation to the subVV code for any given architecture, but this did not seem like a high priority at the moment.

Limitations and Caveats

Once again, the current patch is intended to be a starting point for discussion. It's definitely not yet a perfect or complete solution to the problem of supporting constant-time operation for cryptographic use. In particular, the current patch has at least the following known limitations/caveats (and probably others that I've missed or forgotten to mention):

Apologies for the lengthy writeup. At any rate, I hope that this discussion and the corresponding preliminary patch illustrate that although not by any means trivial, it appears to be feasible to make math/big support constant-time operation for cryptographic use with (a) minimal public API change, (b) no significant performance penalty for non-cryptographic variable-time operation, and (c) not too invasive changes even internally within math/big outside of the nat.go and int.go files. Further, even with the patch's current known limitations, I would suggest that it already makes math/big a lot safer for cryptographic uses than it currently is, and hence represents a step forward in security.

Comments/discussion welcome. Thanks.

bradfitz commented 7 years ago

/cc @griesemer

griesemer commented 7 years ago

I'm not a security/crypto expert and and I don't have a good understanding of the viability of timing-based side-channel attacks specifically when using Go's crypto routines based on math/lib. My prior understanding was that there's still bigger fish to fry elsewhere, but I will need to leave that judgement to @agl.

That said, adding an extra mode to math/big for all core operations is going to hugely increase the chances for errors because it's pervasive and affects which optimizations can be done and which can't. Any current and future local optimization may cause timing differences depending on operand values that will have to be considered. The patch mentioned in the proposal is pretty pervasive, supporting my point. There may be stronger effects on time such as memory layout of operands, etc. complicating matters and hiding/obfuscating timing differences. In short, I am leaning against such changes to math/lib.

If the goal is to provide bignum operations whose execution times depend only on the length of the operands, we should write a new/separate library specifically built from the ground up with that goal in mind. It's going to be cleaner, simpler, and easier to maintain. Since it's written from the ground up every local operation can be tested from the start with timing considerations in mind. It will only have to contain the core operations needed for crypto ops. It can have an API optimized for that purpose. If there's large amounts of unconditional duplication of code we can still factor that out. For instance, I suspect that many of the low-level assembly cores can be shared unchanged. Much of the testing can be reused or perhaps even factored and shared (the results must be the same). It's those things where much of the implementation time goes.

Thus, alternatively, I suggest instead that we come up with a crypto-specific library containing a minimal bare-bone set of operations. For anything not timing-critical, we convert operands to math/big values and use those operations instead (e.g. formatting and printing, which is part of the code with a lot of complexity).

bford commented 7 years ago

Putting constant-time big-arithmetic support in a separate package or library designed specifically for crypto is certainly a reasonable alternative to consider, as I mentioned in my writeup. However, to expand a bit on the reasons I find that approach less appealing:

nikitaborisov commented 7 years ago

I'm not sure about the question of separate library or not, but I'd advocate for at least a separate type ConstInt, all operations involving which must be constant time. The type would make future APIs much cleaner since they would be explicit about which numbers require constant time and which do not. It also creates minimal interference for existing code, since it doesn't change.

As far as creating a maintenance headache from a fork, I think this is unavoidable, because any changes to non-constant-time big.Int implementations must be evaluated from a constant-time perspective before being merged in.

nikitaborisov commented 7 years ago

I'm thinking more about the legacy code problem, and I think the 3 problems you list exist in your proposal, too:

The only alternative I see is changing the code implementing existing crypto APIs to return an error if they are called with a secret parameter that does not have a bit capacity set. But at this point you're effectively changing the API anyway (old code breaks), and you might as well enforce the change through the type system rather than with a runtime error.

bford commented 7 years ago

Thanks Nikita for weighing in with your thoughtful comments.

I don't have any strong objections to introducing a new type such as ConstInt, but just want to point out the following downsides:

Regarding the legacy code problem, responding to your points (a)-(c):

(a) No, the implementation of crypto/rsa would not need to (and should not) call SetBitCap() on big.Ints passed in to it (e.g., parameters and public keys), only on the big.Ints it uses internally as destinations for big.Int arithmetic it performs itself. Remember SetBitCap() configures the size of a result and hence is called only on a destination big.Int. A function that receives a big.Int parameter should not call SetBitCap() on any of its big.Int parameters because that would amount to modifying its inputs, which is against API conventions. A function like RSA encryption/decryption is only responsible for (and allowed to) call SetBitCap() to configure the sizes of the result big.Ints it computes, either as intermediate results or to pass back to the caller as the ultimate answer. Most importantly, since the crypto/rsa, crypto/dsa, etc implementations can insert the appropriate SetBitCap calls internally to the big.Ints they generate as part of their operation, none of the public type signatures need to change in order to support constant-time implementations.

(b) The documentation for the calling conventions to APIs like crypto/rsa should of course be updated to recommend (ideally require) that the caller who is providing sensitive inputs such as RSA private keys properly configure the big.Ints representing those secrets via SetBitCap() before loading them with secrets or passing them to the crypto/rsa module. Thus, the constant-time conversion will not be fully complete or water-tight until callers of APIs like crypt/rsa heed this advice and make sure the big.Int inputs they pass to the crypto APIs are properly configured via SetBitCap(). However, (a) many of these big.Ints will in fact have been generated and provided to the caller by some other method(s) from the same crypto API, e.g., the RSA key generator or the standard ASN1 key unmarshaling code, which can be updated to produce constant-time big.Ints silently without the application/client code actually needing to be updated at all, and (b) even in the cases where the calling code is doing its own generation (unmarshaling etc) of those big.Ints used as inputs to the crypto API, quick non-constant-time operations like marshaling and unmarshaling tend to be much harder to exploit than the compute-intensive actual "core arithmetic" operations like exponentiation for RSA or scalar multiplication for ECC algorithms, so we gain a lot of security by constant-time-hardening the implementations of those algorithms even if that doesn't automatically mean complete constant-time-hardening of all legacy applications using those APIs.

(c) In the SetBitCap proposal, no "conversion" is required of big.Int inputs to crypto APIs such as crypto/rsa, even internally, whether or not the calling application has properly configured constant-time input big.Ints by calling SetBitCap(). The only difference this creates is (a) what size Word[]-slice internally represents those input big.Ints, and (b) whether those Word[] slices are normalized (most-significant-word always nonzero) or non-normalized (most-significant-word may be zero). The current big.Int (and internal big.nat) code already, unavoidably, handles the "hard part" of handling all input combinations in the general case when one of the operands is longer than the other (adding a long big.Int to a short one or vice versa, etc); if you delve into the big.nat code you'll see that quite a bit of logic is dedicated to that. That complexity is all required regardless and is already present. The only new complexity is the need for the internal big.nat code to be able to handle not just mixed-length but also non-normalized Word[]-slices as parameters. There were of course some normalization assumptions in the code, but those are not that numerous and are pretty easy to address, as my proposal already discussed. Dive in and look at the code if this is unclear. But at any rate, the point is that no conversion (type conversion or format conversion or anything) is required in general, even internally, to ensure that any big.Int target can accept any combination of "constant-time" or "variable-time" big.Ints as inputs.

bford commented 7 years ago

Apologies if the initial proposal was a bit overly long and impenetrable. Based on the first few reactions I've seen, perhaps it's best if I back up and ask those interested (and still paying attention) to express their sentiments on a few high-level questions of priorities, while trying to pull apart a few related issues. Either simple yes/no responses, or more elaborate responses, would be fine.

  1. Is it a priority to fix the public-key crypto in Go's standard library to support constant-time operation? In other words, is it worthwhile at all to address the potential timing-channel risks in this public-key crypto and the software that depends on it (e.g., Go's built-in TLS and the applications depending on it)?

  2. If the answer to question #1 is yes, then is it a priority to address constant-time operation in a way that is compatible with the currently-frozen API signatures of the relevant Go public-key crypto packages that rely on the 'big.Int' type (e.g., crypto/rsa, crypto/dsa, crypto/ecdsa, crypto/elliptic)?

  3. If the answer to question #2 is no, then should those public-key crypto packages in the Go standard library be deprecated, i.e., marked "do not use", on the basis that their public-key arithmetic cannot and will not be made cryptographically constant-time while dependent on big.Int?

Thanks.

griesemer commented 7 years ago

I've now just reread the entire proposal and all comments. First, let me answer a couple of your earlier questions:

Regarding sharing of the underlying assembly routines: If the decision were to create a separate package for constant time operations, it would be fairly easy to share the core routines w/o exposing more in the existing APIs by creating an internal "arith" (or similar) package under the math directory. Such a package, while having a public interface, would not be accessible outside the std library. We do this already in various places in the std library.

Regarding your concern about the compiler's conditional evaluation logic (you bring the example of the modified basicMul): The compiler must preserve semantics and thus most likely will evaluate in sequence. There may be optimizations but they will have such a small impact on exact runtime that I doubt they can be reliably measured unless they happen in an innermost loop (where we would want to avoid an extra condition in the first place for general performance reasons). Unless you're on a completely and reliably "quiet" system doing nothing else but the crypto computations, runtimes will fluctuate by several percent just due to cache misses, memory use, etc.

I can definitively see the attractiveness of being able to use the existing math/big library largely unchanged and just have time-sensitive clients set the appropriate flags. I am worried that this "dynamic" approach is extremely sensitive to user and implementation error. I would much rather prefer this were somehow statically enforced. As such I do have sentiment for @nikitaborisov suggestion of a constant-time integer type. Adding that to package big might be an easier solution than writing a new library entirely (as I suggested in my first comment). It also may make interoperation between regular and constant-time integers easier. But it still leaves the issue of having to change all the clients in perhaps backwards-incompatible ways.

But going to your big 3 questions at the end:

I cannot answer your first question. As I mentioned before, in the past @agl thought there were bigger fish to fry before tackling this. I think crypto experts will have to weigh in here with a pretty concrete risk assessment: Would it be easily possible to stage an attack in a real-wold scenario with a high chance of success by exploiting the time side-channel using Go's crypto code? (Or are there other timing issues that completely dominate fluctuations in the crypto performance?).

If the answer to that is yes, then your question 1 would probably have to be answered as yes given the increasing popularity of Go.

I think we need to get an answer here first before we dive deeper into the problem.

rsc commented 7 years ago

Hi Bryan! Thanks very much for this proposal. Personally, I think it's important to make sure that Go can support arbitrary crypto code in this way, but I don't know that it's important. If someone were to challenge us on that point, are there examples of interesting or important projects where people are using Go that depend on having constant-time arithmetic? I think that kind of evidence of significance would help tilt the scales a bit here.

I also wanted to write to let you know that due to travel and vacations and the like, essentially all the relevant people on the Go team will be away for all of July and likely the beginning of August. I intend to try to move the process along starting in mid-August or so; if you don't hear from us before then, we're not ignoring you. Thanks again.

bford commented 7 years ago

Thanks Robert and Russ for looking at the code and proposal.

Robert: on the dynamic vs static selection of constant-time operation, I agree that in principle static type-based enforcement would be preferable. But note that the nature of the constant-time arithmetic requirement is not just a boolean on/off decision: in the "on" case, it's necessary to configure the result size, which necessary varies with cryptosystem and often with particular parts of a cryptographic operation. For example, discrete-log and elliptic-curve cryptosystems typically have both field arithmetic and scalar/exponent arithmetic, which often use vastly different-size moduli, so constant-time operations for one need to be configured somehow with rather different-size results buffers than constant-time operations for the other. Sometimes values cross the two domains, e.g., the way the elliptic-curve x coordinate in ECDSA gets injected into the scalar arithmetic, or the way in Paillier cryptosystems some operations are done mod n and others mod n^2. To get this "right" in a fully statically type-checked way, Go would need parameterized types, at least types that can be parameterized by integer constants (representing results sizes). And even that would probably be cumbersome unless the type system had pretty sophisticated facilities to allow statically type-checked conversions when arithmetic needs to cross different sizes and moduli in a statically type-safe way. Certainly Go currently doesn't have anywhere near the type system sophistication for that, and even if generics are in the works, this kind of heavy dependence on sophisticated typing doesn't really seem like Go's style.

The next-best option, which seems like a more reasonable tradeoff and more aligned with Go's style, would be something like the "modular integer" type in our Kyber crypto library:

http://godoc.org/github.com/dedis/kyber/nist

This basically represents a type similar to big.Int in operation (and our implementation uses big.Int underneath) but parameterized with a constant modulus, which automatically gets applied to all operations performed via this type. Given that a lot of (though not all) big-num crypto arithmetic tends to be done modulo something or another, and the modulo parameter can be taken as implicitly defining the result size, that makes it potentially natural simply to specify that this particular "ModInt" type (pick your favorite name) always operates in constant time. Then the difference between "big.Int" and "big.ModInt" can be more-or-less taken as the statically-enforced boundary between variable-time and constant-time arithmetic. I think it would be extremely reasonable and likely desirable to add such a "modular-int" type to the Go library at some point, and I had intended that to be an eventual "followup step" to the above proposal.

But it still wouldn't solve the complete problem, since as I mentioned much but not all bignum crypto code is done modulo something or other, and there would still be the problem of the legacy crypto interfaces that are already hardwired to use big.Int rather than big.ModInt or whatever the new type might be called. So the "complete" solution I would prefer to see eventually would be to have the new big.ModInt type, which most (new) crypto code could use for its arithmetic performed modulo something or another and would be constant-time by default - but also to have a way to configure big.Int for constant-time operation as proposed above, both to support legacy uses of big.Int and to provide a "lower-level interface" to the constant-time arithmetic when needed. The new big.ModInt type could then be built quite simply and easily atop the proposed big.Int with configurable constant-time support.

bford commented 7 years ago

Russ: what exactly would qualify as "examples of interesting or important projects where people are using Go that depend on having constant-time arithmetic"?

For example, all of my group's projects, since I moved to EPFL two years ago, are in Go and depend on crypto that really needs constant-time big-int arithmetic. These projects include:

All of these projects build on, and drive the development of, our Kyber advanced crypto library for Go. Kyber provides a general framework for advanced elliptic-curve crypto including support for general zero-knowledge proofs, verifiable shuffles, [verifiable/joint/publicly-verifiable] Shamir secret sharing in cryptographic contexts, etc. All of this really needs constant-time arithmetic.

We are soon going to release Kyber v1 (currently a branch), which we think is fairly stable, solid, and usable - but it has an important limitation: it will initially support only one elliptic curve, Ed25519, in the default "recommended for real use" configuration. This is because the Ed25519 curve arithmetic implementation has constant-time arithmetic hard-coded and hand-optimized for the Ed25519 curve by DJB and ported to Go by @agl. Our Kyber infrastructure actually supports many other curves already, and we'd like to support more (e.g., the extreme-security Ed448 curve recently standardized in RFC 8032) - but the implementations of the arithmetic for all the other curves depend directly and/or indirectly on Go's big.Int. Therefore, we can't in good conscience recommend that people actually use any of these alternate curves, including the nominally "more secure" ones, while big.Int has no constant-time support, so we decided to leave them only conditionally-compiled and disabled by default in the upcoming Kyber v1 release.

Of course we can - and might - just rewrite Kyber so that it doesn't rely on Go's built-in big.Int at all. But then we'd either have to lose the use of all the nice architecture-specific assembly optimizations that the big.Int and underlying big.nat modules include, or else fork the whole mess and maintain a separate copy of all that, and neither of those choices is particularly appealing.

Backing up, if our own projects don't count as "interesting or important" enough, it shouldn't be hard to find plenty of other examples. For example, Go is increasingly popular in blockchain and distributed ledger projects of many types. For example, there is a Bitcoin implementation in Go - though not sure how functional or popular this in particular is. Besides Bitcoin's standard existing use of the secp256k1 curve (which I think is the one and only curve that Go's standard crypto library has special-cases for constant-time arithmetic), Bitcoin has been considering adopting aggregate Schnorr signatures for transaction signing, which again really needs constant-time bignum arithmetic. Hackers have gotten extremely good at exploiting the smallest side-channel leaks of any kind, especially when the "universal bug bounty" of Bitcoin wallets are the prize. Similarly, one of the standard implementations of Ethereum is in Go, and I understand this implementation is quite popular. I expect there are many other examples; these are just the ones that come to mind immediately.

Note that @nikitaborisov, who commented earlier, is an extremely well-known systems/applied-crypto person (you should know his work if you don't already :) ), and my reading of his comments is that he had no argument with the necessity and importance of making the crypto arithmetic constant-time, but was just (understandably) unsure the best way to go about implementing it. But I'll let him speak for himself if he wants to comment further.

Perhaps @tomrist - "Professor Side-Channel" himself ;) - might also like to weigh in?

bford commented 7 years ago

P.S. Also, I understand the vacation schedule issues - no big hurry on my end either, as we have one solid curve to work with (Ed25519) for now and we're largely busy with other higher-priority things as well for now. But I would really love to see some kind of progress on this problem within the next few months at least. Thanks again for reading the code and proposal.

rsc commented 7 years ago

@bford, thanks for all the examples. That's fantastic. I'll grab @agl and @griesemer about this in August.

agl commented 7 years ago

Constant-time arithmetic has been a long-standing desiderata. At the moment, it's probably the case that an attacker who can position themselves on the same machine as a Go process doing many crypto operations (e.g. a TLS server) can extract secrets.

(I've not done all the work to show that this is possible, but there's a large literature on these sorts of attacks and there's no reason to believe that it wouldn't apply to some part of Go. expNNMontgomery, for example, does not do a constant-time read of the table of powers. Perhaps RSA is saved by blinding, but the exponent is still constant.)

It is a common design in crypto libraries to have bigints that never have the most-significant limb be zero, like math/big. However, that's because they, like Go, were built on a more general-purpose library. The solution in those cases is generally to have a magic modexp operation that is constant-time and operates without the usual library methods. Apart from that, they just hope that the remaining leaks are small enough not to worry about. Maybe that's practically true, but side-channel attacks keep getting better.

Modern crypto primitives have generally taken the approach of abandoning generic bigint libraries for fully-custom code. That's why we have P-256 and X25519 implementations that are constant-time and don't use math/big. The couple of examples of constant-time bigint libraries that I'm aware of are both recent and are BearSSL and a library mentioned in papers by the Everest folks.

If we were to try to extend math/big for constant-time then a few points came to mind when reading the above:

Should this be something that Go takes on, I've no initial strong feelings about extending math/big vs a separate package, but it occurs that converting big.Ints at package interfaces need not be too onerous since Bits would allow the value of an alternative type to be aliased with a big.Int.

bradfitz commented 7 years ago

@aclements, @randall77, @mdempsky, @josharian, how hard would it be to add a flag to the compiler to increment a counter on every memory load, even if it's slower?

The idea is that we'd run a builder in that mode and the math/constantbig (or whatever name) package would conditionally enable constant-time testing if those counters were available. This would remove the need for a patched valgrind.

aclements commented 7 years ago

This would remove the need for a patched valgrind.

If I understand correctly, the patched valgrind is doing much more sophisticated things than counting memory loads. But maybe counting memory loads is an okay baseline (@agl?).

josharian commented 7 years ago

Patching the compiler wouldn't handle assembly. I suppose we could patch the assembler. We could instrument branches (basic blocks) as well as memory loads. Difficulty is...medium-hard? There's also deciding how it interfaces with the test framework, which is non-obvious. It'd be good to know what exactly the patched valgrind is doing, as a baseline.

aclements commented 7 years ago

For counting memory loads and branches, I wonder if we could use performance counters instead? There might be a tiny bit of noise, but assuming the test stresses things in ways that would otherwise have huge disparities, this might be much simpler.

josharian commented 7 years ago

Or we could just extend the coverage tool to instrument assembly as well as Go code. (Note that memory loads/stores happen in basic blocks.) Then write a driver that executes a function with different inputs and reads the coverage results and looks for discrepancies. If we're worried about basic counts missing bugs, we could also sample coverage sequences at fixed intervals and fail if any sequence mismatched.

bradfitz commented 7 years ago

@aclements, perf counters definitely seems easier to start with. Unfortunately GCE (where most our builders live) doesn't support perf counters, last I checked. But we could run a dedicated machine or two just for those tests.

agl commented 7 years ago

The constant-time property is that, for all inputs, the trace of instructions fetched and memory addresses read, is constant. (In an abstract fetch-execute machine.)

Valgrind basically already checks that for uninitialised memory: branching on uninitialised memory or indexing based on it is an error. Thus we can tell Valgrind that our secret input is "uninitialised" and get it to check constant-timeness for us.

(At the end of the computation you have to "bless" the secret memory as ok to read from, otherwise you could never, say, output a signature that you have computed. That is done at a point in time where there is a sufficiently large "cliff" that it's safe to do so. The attacker, in contrast, wants to climb a gradual hill to extract the secret bit-by-bit.)

I don't know whether perf counters include the noise of context switches, prefetching, branch prediction etc. If so, that would be unfortunate. Also, it wouldn't detect secret-based memory indexes unless some of those branches were triggered by random inputs and the different branch count could be observed.

Perhaps the way to go is to hack up Valgrind until it's happy with Go binaries and then use the trick above.

josharian commented 7 years ago

The constant-time property is that, for all inputs, the trace of instructions fetched and memory addresses read, is constant.

Thanks, that's helpful. The need to record memory addresses read makes extending go tool cover less appealing.

Expanding on the idea of teaching obj (the assembler) to instrument binaries to generate exactly this trace, I guess it'd look very roughly like:

package cttrace

const bigEnough = 10000000

var trace [bigEnough]uintptr
var end unsafe.Pointer

func Reset() {
  zero trace up to end
  end = unsafe.Pointer(&trace[0])
}

func Check() {
  if bigEnough wasn't big enough {
    panic("oops")
  }
}

func Trace() []uintptr {
  return slice of trace, to end pointer
}

For specified packages (or functions?), at every basic block, emit (in assembly): MOVQ IP, cttrace.end; ADDQ $64, cttrace.end. At every memory access, emit the same, but using the address being read/written instead of the instruction pointer. (If we want to distinguish IPs, reads, and writes, change Trace to be a struct and record a kind field.) Test by writing a regular Go driver that does: (1) call cttrace.Reset, (2) call function in package of interest, (3) save cttrace.Trace(), or maybe a hash of it. Compare all traces.

Definitely more complicated than performance counters, though. And maybe more complicated than making Valgrind work with Go binaries.

bford commented 7 years ago

Thanks @agl for weighing in; all your points are right-on. A couple followup comments:

Modern crypto primitives have generally taken the approach of abandoning generic bigint libraries for fully-custom code.

Agreed, and for the record I don't have any objections to such fully-custom implementations especially for extremely popular common-case crypto algorithms like DH and EdDSA. We're using those too, as mentioned above. But I think it's pretty sad if fully-custom code is (and remains) the only way to get plausibly-secure implementations of bignum-based crypto algorithms. Getting our crypto-software ecosystem entrenched into pervasive dependencies on fully-custom code for side-channel security creates even higher artificial barriers to use and deployment of new curves and schemes that in principle provide important security or functionality advantages (e.g., Ed448 or other "spinal-tap-grade security" curves; pairing-based curves for all the wonderful gizmos like accumulators and zk-SNARKS that they enable; lattice-based crypto for fully-homomorphic encryption and post-quantum paranoia...). It would be much better all-around if we could use fully-custom code only to optimize the most performance-critical common-case paths, rather than for all crypto that we hope/expect to be reasonably side-channel secure.

math/big has a large API and so there would be a large testing surface to ensure that every function still gave the correct answers when given a "capped" value for each input. I think this is solvable with unittests, but would need some thought.

Agreed that that this is an important and non-trivial challenge. My proposed starting-point patch ventures a bit into this direction by adding a few spot-checks of this kind to the unit tests, just to get a feel for how they might be done, but it'll of course take a lot more work and thought to develop really thorough unit tests for the capped-input cases.

Is a capped value to be taken as a generic "constant-time" flag? There is more value-dependent logic than just normalisation so what would trigger, say, a modexp that's constant-time in the exponent? A capped exponent value, even though that's an input?

My current patch treats the existence of a capped-value configuration in the result object as the "constant-time" flag. And yes, the proposed patch already uses this flag to deal with a bunch of the other value-dependent logic (besides normalization). See for example karatsubaAdd, where I disable the value-dependent optimization of skipping multiplies by zero words when 'zcap > 0' (the flag). And similarly, in several places in karatsuba, where this flag triggers the compute-both-options-and-select approach to value-dependent conditionals. I certainly don't claim that this is necessarily the best way to handle this, but in my limited experience so far it seems to work reasonably smoothly. The constant-time flag could of course be handled with a state element separate from the result size or cap configuration, making them two orthogonal configuration elements, but that would seem to increase complexity (and testing surface) further, and I've tried and failed to think of any good reason one would want the "capped" result size but not constant-time operation or vice versa.

Wherever the code lives, we don't have any way to test whether we have achieved constant-time behaviour, esp over time as code is changed.

Agreed, this is an important problem. Just for the record, as far as I know this "untestability of constant-time behavior" is unfortunately the current state-of-the-art across all languages I'm familiar with, especially in C, where compilers frequently get so overly smart that they have regularly been found to silently "optimize" fully-custom constant-time crypto source code into badly non-constant-time assembly output. Having support in big.Int for what "human eyeball review" says should produce constant-time behavior would be a big step forward for Go, even if that behavior is not yet thoroughly testable all the way to assembly/binary level. In other words, I'd suggest we prioritize fixing the huge, obvious, gaping crevasses between Go's current behavior and nominally source-level-constant-time practices before we worry about (or get blocked on) the further problem of testing, finding, and fixing the likely hairline fractures between what we (humans) think will produce constant-time behavior and what will actually produce constant-time behavior given all allowed compiler optimizations and such.

At the same time, I will be hugely ecstatic and supportive if Go jumps into this seriously enough to incorporate "end-to-end" constant-time behavior testability, using a Valgrind-like or other analysis. That would put Go way ahead of the standard state-of-the-art of support for constant-time crypto in other languages as far as I know (although I'm not an expert on the very latest in compiler/language support for constant-time). Toward this end, several intermediate points might make sense as well. A Valgrind-like analysis of entire traces to ensure constant-time behavior with respect to both instructions and memory-access indexes, as @agl suggests, would certainly be ideal if/when realistically achievable. But even weaker, perhaps easier-to-implement and/or more lightweight sanity-check testing measures might be quite useful as well, such as a simple "number of instructions executed" comparison metric. There are performance counters that can measure exact number of instructions (and/or branches) executed along a code path, for example (I used them in my Determinator/DeterLand work). This wouldn't catch all accidental non-constant time behavior, especially memory-index dependencies, but intuitively I think it would be able to catch a high percentage of the accidental non-constant-time behavior that might slip into math/big in particular. Almost all of the "constant-time hazards" I identified so far are conditional/branch or value-size related and thus would be likely to be reasonably effectively sanity-checkable via simple instruction count (without implying that this would be an exhaustive test).

So in other words there might be a testing complexity/efficiency versus exhaustiveness balance to be considered here, and the two approaches need not be mutually exclusive. If the full-scale Valgrind analysis is implemented but proves to be too costly to run by default every time anyone runs 'go test ./...' on math/big, but a simple instruction-count or other lightweight sanity-check can be used all the time in that way, then both might be useful in their places. But I'm not the expert on these analysis techniques so I don't take any strong position one way or another.

AnomalRoil commented 7 years ago

I recently played around with Dudect (from the paper "Dude is my code constant time") and ended up reimplementing it in Go. While this might be considered as a "lesser", or "easier", way to go around constant time checks, since it relies on statistical tests, it actually allows to easily test timing discrepancies in (Go) code thanks to Welch's t-test and seems effective in practice. My goal when I did it was to see to what extent the DecryptOAEP function from Go's rsa package was leaking the number of leading zeros mentioned in Go's code.
My conclusion was that while the leftPad's function clearly leaks, the whole DecryptOAEP function was suffering too much from the background noise for the leak to be of any use.

So, my point here is that while having a constant time big.Int library in Go would be awesome (and I totally support the idea), it would also probably require a lot of work on the existing Go's crypto code in order to avoid altogether all leaks (ie. it won't suffice per se.)

Regarding the latest topic here, I would also be thrilled to see support of perf counter for constant time code execution at the compiler level, allowing for unit tests to actually catch timing discrepancies, this would be awesome. Note that statistical tests of timing discrepancies should probably not be considered as a viable option, since they usually requires thousands of traces to be accurate and as such would clog the tests.

rsc commented 7 years ago

One option I don't see above (but maybe I missed it) is not splitting big.Int or math/big but splitting the API methods so that there would be ConstantTimeAdd, etc. Would that be clearer than reusing cap? Would it be enough? Would it not look awful?

rsc commented 7 years ago

It's probably getting a bit late for Go 1.10, but it would be nice to make progress on trying to figure out what the plan is. Any replies to my Aug 14 comment?

/cc @bford @nikitaborisov

ianlancetaylor commented 6 years ago

Anybody want to fill out this proposal to see what changes would be required in math/big?

davidben commented 6 years ago

We recently fixed a similar issue in BoringSSL. We went with a slightly different approach, closer to @rsc's suggestion above to have separate functions. I did a write-up for OpenSSL here, but I expect it's applicable to Go as well: https://github.com/openssl/openssl/issues/6640

Using some sort of state inside the big.Int is awkward. First, it is error-prone. One must follow values everywhere to make sure the state got set properly. OpenSSL has a BN_FLG_CONSTTIME that has not worked well: https://github.com/openssl/openssl/issues/6078

Second, this can run afoul of speculative execution issues. See the guidance here: https://github.com/HACS-workshop/spectre-mitigations/blob/master/crypto_guidelines.md#1-do-not-conditionally-choose-between-constant-and-non-constant-time

Instead, we've found that, independent of the widths issue, it's cleaner to consider timing properties part of the function's API contract, and not the value. If two functions both add, but they have different timing contracts, they should be different functions. (It wouldn't have been exclusively the value's API contract anyway, since not all operations support constant-time.)

flowyroll commented 6 years ago

@josharian @bradfitz @aclements Performance counters can only be helpful for remote timing analysis, since they only provide aggregate information such as the number of cache misses, branch operations, instructions executed, etc. They can be quite accurate if they are applied per process and to a specific function, but I would generally be disagree to put the baseline on remote timing analysis.

The microarchitectural attacks such as cache attacks are of course ever growing threats, though a modern cryptography library should at least be protected against known attacks. The known attacks such as attacks on cache, BTB, MOB and speculative execution shows that there are basically two sources of leakage threatening cryptographic implementations: 1) secret-dependent memory accesses 2) secret-dependent control-flows. alas, constant-time testing requires collecting traces for both branches and memory accesses at binary level.

As mentioned by @bford, compilers also add lots of optimizations and also some part of the crypto implementations are developed in assembly, so the instrumentation need to be as close as to what it executes on the processor to simulate the real leakage.

Efficient comparison of huge amount of traces for discrepancy is also another challenge.

In a recent research project, we tackled the same problem based on a binary instrumentation framework (Pin) and Mutual Information (MI) Analysis. We could efficiently test closed-source binaries for microarchitectural leakages and correlate which memory accesses contribute to leakage (if any). Such a tool can also be extended to use traces generated from LLVM, compiler, etc. Further, other statistical methods such as t-test can also be used instead of MI.

robaho commented 5 years ago

Isn’t is easier to put a lower bound on the computation time during the crypto op that is an order of magnitude higher than the expected computation time and spin reading random values. Much easier than writing a completely new library.

davidben commented 5 years ago

That does not work for cache-based attacks. It also requires thinking about accuracy of those measurements and hurting performance by the overestimate. An order of magnitude performance regression would be unfortunate, especially when a proper constant-time implementation is comparably efficient.

There's no need to write a completely new library necessarily. BoringSSL was in a similar position of having to clear though variable-width big integers. Separate API entrypoints are needed in some cases, but their implementation can still share internals if the performance is suitable. (Some early returns in carry loops, branches, etc., need to be excised.)

A question for Go folks: it's presumably undesirable to bifurcate the crypto APIs, since it means leaving some existing code unfixed. Internals can use big.Int or private types as is most convenient, but our constraint is big.Int bleeding out of public APIs in things like crypto/elliptic's point operations and crypto/rsa.PrivateKey.

Would slightly tweaking math/big.Int.Bits's output be okay? Consider an ECDH function:

func ecdhP256(privKey []byte, peerX, peerY *big.Int) ([]byte, error) {
    p256 := elliptic.P256()
    if !p256.IsOnCurve(peerX, peerY) {
        return nil, errors.New("bad peer point")
    }
    x, _ := p256.ScalarMult(peerX, peerY, privKey)
    // One wouldn't do this, but it's possible via public APIs, so backwards
    // compatibility may care:
    //    fmt.Printf("%v\n", x.Bits())
    return bigIntToPaddedBytes(x, 256/8), nil
}

x above is secret, which means computations must not leak if the top word is zero (see also the result having a fixed-width representation; to be constant-time, bigIntToPaddedBytes wants to be a new big.Int method rather than fixing up big.Int.Bytes's output).

The question then is what x.Bits() returns. Today, it does strip the leading[*] zeros, which isn't constant-time. Options I see:

  1. x is internally stored with leading zeros (by way of SetBitCap as in the first comment or my proposal, whichever), so x.Bits() returns the leading zeros too. x.Bits() is documented to be a very low-level function so say that this change is okay/desirable. (pros: simple; cons: behavior change to public API)
  2. x.Bits() should not change behavior. At runtime strip off leading zeros and return a slice of the real thing. x.Bits() is not constant-time, but that's fine. We can add x.PaddedBits() for crypto code that cares. (pros: public behavior unchanged, cons: x.Bits() is potentially O(N) rather than O(1) and unlike, say, x.Bytes(), it's meant to be very cheap alias into the big.Int itself)
  3. Same as above, but big.Int stores a secret minimalWidth field, so x.Bits() can remain O(1). This minimalWidth field must be recomputed in constant-time every time a big.Int is returned out of a public API. (pros: public behavior and asymptotics of x.Bits() unchanged, cons: messy, minimalWidth must be computed in constant-time everywhere, even though most big.Ints don't see a Bits() call)
  4. Give up on crypto/elliptic's current APIs and introduce new ones detached from big.Int. (pros: no compatibility constraints, cons: existing code must migrate to fix timing, double the API surface)

I would advocate (1), but I don't know whether that's considered compatibility-breaking. (2) and especially (3) seem a mess. I think (3) implies fully excising big.Int from all crypto internals, so we only need the minimalWidth computation at crypto API boundaries, rather than everywhere. (We did (1) for BoringSSL, but we don't consider reaching into BIGNUM internals to be valid use of APIs.)

[*] Really trailing zeros, since big.Int stores words in little-endian order.

robaho commented 5 years ago

@davidben can you point me to some papers that describe why it wouldn’t work with cache based attacks? I’m also doubtful that an order of magnitude is too much for key based security checks, and that was just a guess but it’s probably even easier to add a random amount of cycles to the computations.

Constant time math is probably going to be an order of magnitude slower anyway.

josharian commented 5 years ago

I am not a cryptographer, but from reading descriptions of attacks, even the slightest amount of signal appears is enough to break a system. And neither adding noise nor adding large overhead is enough to mask the signal. And if you had significant extra cycles to devote to cryptography you could deploy stronger algorithms; there is always a cost benefit trade-off. The right fix is to make things constant time, and trying to common-sense-engineer your way around that invites the usual outcomes. But again, IANAC.

robaho commented 5 years ago

I don’t think that is the case. If you look at the common cases where Go is deployed it is behind a network with multiple hops, it’s running in a vm, it’s a random machine with random hardware where it runs, and the machine has a highly random load factor. There is so much variability in it that a timing attack would never be successful.

Also the number of login attempts would raise a red flag with any modern competent security setup.

creker commented 5 years ago

timing attack would never be successful.

That's a common misconception. I don't have a link but I remember reading multiple papers that had success with filtering out any noise and still get a successful attack. There's a reason side-channel attacks are so scary and hard to fix. Adding random delays doesn't solve the problem.

robaho commented 5 years ago

@creker that doesn’t mesh with my research. The only successful attack required a local network, no outside activity, and no outside monitoring, and millions of tries over several hours. Especially since these systems don’t even allow code that uses direct memory addressing to run from external sources, which are required for cache based attacks like Spectere and meltdown. The magnitude of the noise and variability in any network deployed system is going to exceed what is required for a cache based attack, or all of these systems would of been broken long ago

Security requires a holistic approach. It is probably far easier to steal the servers certificates and the password to the store using other means than an external cache based attack to try and backdooor decode them. Which is why modern systems use physical key security and devices. Those usually require physical access to be compromised, especially by their design that use randomness in the process with arbitrary seeds to protect against cache attacks.

Also, the OP says the big godoc warns about this security hole - I see no such warning.

robaho commented 5 years ago

This proposal should be dropped until the OP provides a single paper reference that shows these attacks are possible outside of a lab setting in a real world environment with modern security controls. If the site isn’t doing basic rate limiting detection on authentication it probably has tons of other security holes far worse.

In fact I’ll make a challenge. I will deploy a simple service that checks a provided password against a known one. No encryption even. I will deploy it to the Google cloud. The op won’t be able to guess the password in my lifetime using a timing attack, if so I’ll give him$1000.

Oh, and I will not use a constant time string compare, of course.

robaho commented 5 years ago

Rather than just offering anecdotal evidence, I developed timingattack to test the theory.

You can go there for complete details, but in summary, the difference in timing for a string compare is 0.5 nanos (with an impossible 0 variance), and the standard deviation for a remote network call to a deployed service is > 20000 nanos.

There is no statistically possible way a remote security service is susceptible to a timing attack. Even a local attack has a variance of 60+ nanos - far greater than what would be required to execute a timing attack.

Not to mention that the number of required attack requests would trigger a DOS shutdown, and a DDOS does not help as the clock synchronization required would make it impossible.

So, IMO, this proposal needs to be rejected, as there is not legitimate need for it given the Go target environments.

And @tmthrgd , it would be far more helpful if you took some time to state what your objections are, rather than resort to unhelpful emojis.

edit: I also added a commented out option, that adds a degree of randomness to the comparison that is in far excess of the comparison threshold, effectively preventing any attack - without resorting to a constant time string compare.

randall77 commented 5 years ago

This proposal should be dropped until the OP provides a single paper reference that shows these attacks are possible outside of a lab setting in a real world environment with modern security controls. If the site isn’t doing basic rate limiting detection on authentication it probably has tons of other security holes far worse.

From Wikipedia:

In 2003, Boneh and Brumley demonstrated a practical network-based timing attack on SSL-enabled web servers, based on a different vulnerability having to do with the use of RSA with Chinese remainder theorem optimizations. The actual network distance was small in their experiments, but the attack successfully recovered a server private key in a matter of hours. This demonstration led to the widespread deployment and use of blinding techniques in SSL implementations. In this context, blinding is intended to remove correlations between key and encryption time.

(The link to the actual paper is on the Wikipedia page.)

There are some caveats there: the network distance is small, and they used a bunch of samples (2800 per bit, if I'm reading it correctly). But it is feasible. All the random noise you add to the timing is averaged out with enough samples, revealing the underlying signal.

Rate limiting will help, sure, but the math I get is they could break a 1024 bit RSA key in a month with 1 QPS. Not the fastest break, certainly, but well with the realm of possibility. Enough that this proposal should be taken seriously.

Rather than just offering anecdotal evidence, I developed timingattack to test the theory.

Your benchmark code isn't measuring anything real. String comparisons aren't done one character at a time. On 1.11 amd64, at least, we compare the first 8 characters with a single CMPQ. Both of your benchmarks should execute the exact same instructions.

String comparisons are not the non-constant-time operations in question. They are things like the Montgomery multiplication inside math/big. (That's what the Boneh and Brumley attack works on.)

tmthrgd commented 5 years ago

I expressly didn't want to comment because I don't want to subject the people subscribed to this thread to any more unwanted noise.

Adding random noise to the computation time won't achieve anything as it's very easy to eliminate noise with statistics, by taking enough samples (IIRC the number is smaller than you'd think) and "averaging" the result. It would also be magnitudes slower than a proper constant-time implementation, while being as insecure as doing nothing.

"the common cases where Go is deployed it is behind a network[.]"

This isn't a reason to oppose this because Go is used for many other applications that have nothing to do with a network, but even if it were the jitters of a network can be filtered out with statistical analysis.

"Also the number of login attempts[...]"

This issue has nothing whatsoever to do with logins or passwords, it's about the math behind operations like an Elliptic-curve Diffie–Hellman exchange, or an RSA signature or encryption operation.

"The only successful attack required[...]"

You only know about publicly published attacks. The NSA for example has known about many big breakthroughs in cryptanalysis for years, if not decades, before they were publicly discovered. They also have budget and hardware access that far outstrip cryptanalysts employed by say Universities. (Differential cryptanalysis was known about publicly in the late 1980s, by IBM in the mid 1970s and by the NSA some unspecified time before then). Even if we ignore some of the Big Players, cryptographic attacks are advancing at a very fast rate, something that seems secure now (without a formal proof under the correct model), could be found to be insecure in the near future.

"If the site isn’t doing basic rate limiting detection on authentication[...]"

This issue has nothing at all to do with authentication.

"In fact I’ll make a challenge."

Being unable to break a system doesn't remotely make it secure. See this post by Bruce Schneier from 1998: https://www.schneier.com/crypto-gram/archives/1998/1215.html

"Oh, and I will not use a constant time string compare, of course."

This issue has nothing to do with string comparisons.

"the difference in timing for a string compare is 0.5 nanos"

You're benchmarks don't measure a regular string comparison at all, the compiler sees two static strings being compared and optimises it. (Try go tool compile -S timing_test.go and you'll see). Even if it was measuring that, this issue isn't about string comparisons.

"There is no statistically possible way a remote security service is susceptible to a timing attack."

There are ways to do exactly that, and it's also not relevant to this issue. With more cloud providers being used, it's very possible to be on the same network as your target, and possibly even on the same machine.

"Not to mention that the number of required attack requests would trigger a DOS shutdown, and a DDOS does not help as the clock synchronization required would make it impossible."

If you're trying to recover a key that isn't changing frequently, you don't need to make a burst of requests in a short time, you can easily spread them out over weeks or months.

"So, IMO, this proposal needs to be rejected, as there is not legitimate need for it given the Go target environments."

That doesn't make sense to me. Cryptography is a first class citizen in Go (it's in the standard library and powers the TLS implementation) and it's susceptible to timing side channels and should be fixed. This proposal isn't even all that hard to implement. Most of this issue has been about deciding how.

"I also added a commented out option, that adds a degree of randomness to the comparison[.]"

Randomness can be filtered out and is far more costly than a proper constant-time implementation.


Timing attacks are very possible across networks. Here is a paper from 2003, 15 years ago, that demonstrates the recovery of an RSA key from OpenSSL across a local network: https://crypto.stanford.edu/~dabo/abstracts/ssl-timing.html

You're also missing the impact of this, many side channel attacks allow the recovery of the private key which is often valid for years or decades. (Even though TLS certificates are limited to at most two years, the private keys are often reused between certificate reissues). So even for an attack that's tricky or time consuming to pull off, the rewards can be very valuable and provide future attack vectors for years to come.

I was beaten to replying by @randall77 who covered some of what I had already written.

robaho commented 5 years ago

@randall77 @tmthrgd I appreciate the reference to the paper. I am reviewing.

In the meantime, you are incorrectly dismissing the value and correctness of the string compare. In order to perform a timing detect, you need to be able to determine when an operation produces a shortened (and conversely a lengthened) result. The string compare is fine for this, as the tests demonstrate clear differences in timings between the first character being correct and not (which is all that is required). Getting the first character correct, increases the time required by 2x. It is a valid timing attack - especially with a local, direct attack - and it has been used successfully.

Without fully reviewing the paper, I can already find many holes in the argument - given the expected time to crack (days to months), the variance of the attack requests will vary greatly in a network based attack (different equipment, different routing, different loads), making the time duration longer, and essentially invalidating any previous progress since you cannot determine the expected variance easily - as it is continually changing - probably exponentially increases the time required.

The paper doesn't test a remote wan, the "widest" test is a local area network with a single switch. Most importantly, their best success requires 400k attempts - a secure system would never allow this: "We note that without optimizations (-g), separate tests allowed us to recover the factorization with less than 359000 queries." Further it states, "requiring only 5 samples to give a variation of under 20000 cycles (approxi- mately 8 microseconds), well under that needed to per- form a successful attack". These test results show that the std across the wan is > 20 us, so the variances is 400 us - 50x greater than what was tested. And earlier in the paper, they states "resulting in 1433600 total queries".

As I said, a holistic security approach would require a lifetime in a properly secured system to crack the key, since every negative request from an IP should add a random time adjustment, making timing attacks impossible.

But reading the summary of solutions in the paper, none suggest constant time arithmetic, in fact, the suggested solution is:

"Currently, the preferred method for protecting against timing attacks is to use RSA blinding. The immediate drawbacks to this solution is that a good source of randomness is needed to prevent attacks on the blinding factor, as well as the small performance degradation."

And another alternative cited, the time quantization, is far simpler than constant math, as the maximum time is easily bounded, with little performance degradation (since you are making all cases equal to the worst, which is the absolute best case for constant time arithmetic).

edit: they reference using a larger network in experiment 6, but I can find no detailed information on the results

edit: experiment 6 shows similar results for the wan

robaho commented 5 years ago

Also, the paper is using an optimization in OpenSSL to perform the timing attack, "Thus, OpenSSL’s integer multiplication routine leaks important timing information. Since Karatsuba is typ- ically faster, multiplication of two unequal size words takes longer than multiplication of two equal size words. Time measurements will reveal how frequently the operands given to the multiplication routine have the same length. We use this fact in the timing attack on OpenSSL."

Eliminating this optimization would remove the vulnerability, no constant time arithmetic required, although I would assume non-constant time arithmetic would have a similar vulnerability based on the factors, but the timing differences would become so small, that it would exponentially increase the number of request needed.

Still, as I said, and the paper seems to agree, constant time arithmetic is not the solution to the problem.

robaho commented 5 years ago

Also, there tests claim on 5 samples required to get a stable timing under 8 micros. With 100 samples on the wan network it is > 350 micros. With 1000 it is still > 20 micros.

So it would be expected that the number of requests required are off by a factor of 600x in a real-life wan.

robaho commented 5 years ago

To clarify, the only difference in my 'string compare' and other crypto vulnerabilities, is that the the difference in time between the crypto early exit could be much greater, meaning that the number of iterations required to detect the condition would be far less (i.e. the variance can be much higher).

Still, I finished reading the paper, and if there is no other citing, nothing there says the solution is constant math, in fact the preferred method is randomizing, or quantizing the time taken - which can be done outside of the math library in the crypto code.

tmthrgd commented 5 years ago

I really don't want to continue creating noise on an issue that has a very well understand cryptographic background and a well understood means of implementing it. Constant-time arithmetic is very important, and is absolutely something that Go needs.

The reason I linked the paper was not to suggest constant-time arithmetic was a solution to that specific problem, but to refute your argument that timing attacks can never be carried out across networks. It was simply to show that the presence of a timing attack can be exploited across a network, which it absolutely can be. Not only that, it's a paper from 15 years ago, attacks have gotten far better and far more sophisticated since then.

The string comparison isn't a valid point because it's not comparable to the timing leaks from non-constant-time arithmetic code, nor are the routines (emitted instructions) actually similar.

"essentially invalidating any previous progress since you cannot determine the expected variance easily"

This is assuming all the attempts are dependant on each other. Learning whether a bit in a secret is 1 or 0 is not dependant on any other bit of the secret and thus is unaffected by this. Not only that, compensating for a change over time is absolutely possible with the right statistical analysis.

"The paper doesn't test a remote wan[.]"

It doesn't need to. With an increasing number of diverse applications being deployed to the cloud, it's trivial to run an application on the very same network and sometimes on the very same machine. (Also consider something like MUSCULAR which showed the NSA was able to get access to privileged links between Google data-centers. You simply cannot presume your attackers will always be at arms length).

"Most importantly, their best success requires 400k attempts - a secure system would never allow this"

Again you're thinking of login rate limiting, exploiting timing differences, in say TLS, occurs far before the application layer and can be done without it being aware. Making 400k TLS requests (which can be done from multiple machines and locations) to slowly guess a TLS private key, bit-by-bit, is very possible. As the keys are very typically used for years or decades, it really doesn't matter how long it takes to steal.

"since every negative request from an IP should add a random time adjustment, making timing attacks impossible."

This has nothing to do with logins or authentication. Defences against password brute-forcing can't and don't apply here. In fact, timing side channels can conceivably often be exploited even without sending a single failed, broken or invalid request.

"none suggest constant time arithmetic, in fact, the suggested solution is"

The paper wasn't linked to show a particular attack, but that timing differences are exploitable across networks. Blinding solves that problem with RSA, but doesn't and can't address the multitude of timing leaks from other primitives and even other parts of the RSA code.

"And another alternative cited, the time quantization[.]"

I can't find references to that phrase Googling it, but I think I know what you're talking about. It's still not a secure defence against side-channels. Unless you are executing identical CPU instructions, and not falling into the pitfalls of undocumented architecture "features", with identical memory access, you are vulnerable to timing side-channels. Nothing in the world can stop that and nothing can change that, even if it makes it arbitrarily harder.

"Still, as I said, and the paper seems to agree, constant time arithmetic is not the solution to the problem."

No it doesn't at all. It address one specific place a timing leak occurs, it absolutely does not generalise to every other cryptographic algorithm.

"a real-life wan"

Attackers will often have far more advantageous positions than that. Often on the same local network and sometimes even on the same physical machine.

"To clarify, the only difference in my 'string compare' and other crypto vulnerabilities, is that the the difference in time between the crypto early exit could be much greater"

You're benchmark actually didn't measure anything because of compiler optimisations. Also you repeatedly stated the narrowness of the timing differences was proof it couldn't be exploited over a network.

"nothing there says the solution is constant math"

That's not what others think: https://github.com/golang/go/issues/20654#issuecomment-402300802. The BoringSSL code base--I mention it specifically because I've watched it's development--used by Google has specifically taken steps to ensure their math is constant-time where it depends on secret values. I'm sure cryptographers would agree, if I knew any of hand or was well versed in published research.


I'm going to repeat something I wrote earlier that highlights why this can be so devastating. This is no a hypothetical, it's real and understood, and has happened in practice:

You're also missing the impact of this, many side channel attacks allow the recovery of the private key which is often valid for years or decades. (Even though TLS certificates are limited to at most two years, the private keys are often reused between certificate reissues). So even for an attack that's tricky or time consuming to pull off, the rewards can be very valuable and provide future attack vectors for years to come.


Again, I really don't want to keep subjecting people in this issue to noise of this discussion. The benefits and need for constant-time code is very well understand, and it's the only way to eliminate timing side-channels in all cases. Execution patterns and timing must never depend on secret data. Attacks are getting better and more sophisticated, and more and more timing side-channels will be exploited.

robaho commented 5 years ago

@tmthrgd I understand your point, then please direct me to the research that shows the best solution is to use fixed time arithmetic, because I read https://eprint.iacr.org/2009/089.pdf and https://software.imdea.org/~bkoepf/papers/csf15.pdf and http://ijcsit.com/docs/Volume%206/vol6issue04/ijcsit20150604175.pdf - none involve constant time arithmetic.

Even with the math protected, the crypto layer could be broken and still expose it to a timing attack, that why you need to protect against them at a higher layer. As the papers point out, there are many solutions to protecting against timing attacks.

I'll concede that it is possible to get your attacker closer to the attackee, but not easily. In a modern cloud you have no control over the physical location, but some control over the network hops.

Some of your other claims are not valid:

"Most importantly, their best success requires 400k attempts - a secure system would never allow this" Again you're thinking of login rate limiting, exploiting timing differences, in say TLS, occurs far before the application layer and can be done without it being aware. Making 400k TLS requests (which can be done from multiple machines and locations) to slowly guess a TLS private key, bit-by-bit, is very possible. As the keys are very typically used for years or decades, it really doesn't matter how long it takes to steal.

The SSL end-point (router) would not allow it - it is part of the DOS and DDOS attack prevention - it has no need to go the application login layer. DOS attacks are not prevented at the application layer. Password hack attempts are detected at the application layer, and usually kick in after 3 attempts.

From the paper, "Another alternative is to require all RSA computations to be quantized, i.e. always take a multiple of some pre- defined time quantum", and the papers cited above is what I referred to as time quantization.

Your statement about needing to protect the primitives to protect the upper layers is not substantiated IMO. No general purpose OS system is designed that way, it is impossible except given very specialized hardware. The solution is to protect against timing attacks at the crypto layer, not the math layer, and as these paper point out, this is readily doable.

robaho commented 5 years ago

One other note, 'constant time arithmetic' would be nearly impossible for a general purpose system like Go. The code would need to be CPU model and generation specific, because who's to say that a newer processor might not implement math operations with given operands differently (ie. optimize for 0, or 1, or 2, etc.) which would make drastic difference in the times based on the values (which is prohibited by this proposal).

So the only way to get "constant time" would be to externally time the operation, and then pad to a known max value - which can be done in the crypto layer (time quantization).

conradoplg commented 5 years ago

This issue is about how to support constant-time arithmetic in Go without breaking compatibility, and not about whether this should be done or not.

It is a established consensus that constant-time is the way to go. This consensus wasn't created out of thin air in a single paper; you can simply search Google Scholar for "constant-time cryptography" for a multitude of examples. All modern crypto libraries are written entirely constant-time (libsodium, BearSSL). Handpicking a few papers that present other approaches doesn't mean constant-time crypto isn't the best approach in this case. There are zero well-known libraries that try to employ the "time quantization" approach.

Time quantization is not even easier - how to measure time? Which upper bound to use, in which platform? Meanwhile constant-time arithmetic has been done in many places, in many instances. Yes, it isn't perfect, but time quantization isn't either.

Every time people jump through hoops to justify why a particular attack can't be done, it ends up being carried out some time after. Constant-time arithmetic simply kills the whole attack class, without needing to consider if any attacker is close enough, if the network is noisy or not, etc.

Also, cache attacks - they work by measuring the effect of crypto code in caches. Time quantization doesn't change that effect.

robaho commented 5 years ago

I’m sorry but your final statement is not fully correct. A cache attack only means that an operation is faster or slower based on whether repeated operations are faster or slower based on what’s in the cache. So time quantization does address a cache attack.

The addressing cache attack is a form of this but is relies on a bug in the processor due to speculative execution and it also requires knowledge of particular addresses externally which has been removed and is being removed in most OSes.

You also state this is proven and accepted. That’s not what those papers show, nor my searches. First of all this issue is not about how to implement. If you read the OP it is asking for constant time. The fact that this proposal has the length that it does and not a single reference paper has been cited for its validity, or alternatives discussed, is pretty scary.

robaho commented 5 years ago

By reading this https://haslab.uminho.pt/jba/files/16usenix.pdf shows this will be exceedingly difficult, especially given different processor models perform different branch predictions etc. and the high level nature of Go. You might change number of operations to be constant, and possibly memory accesses, but you can in no way equate that to constant time, even if you wrote Go assembly for every possible architecture.

Please just post a single paper that discusses why time quantization (or even blinding) is not sufficient for preventing timing attacks.

Both seem far easier to implement across the board than true constant time math.