secp256k1_scalar_cmov not constant-time on ICC

real-or-random commented 4 years ago

AMD64 ICC 19.1.2.254

==407113== Conditional jump or move depends on uninitialised value(s)
==407113==    at 0x485FB93: secp256k1_ec_pubkey_create (secp256k1.c:568)
==407113==    by 0x401490: main (valgrind_ctime_test.c:56)
==407113== 
==407113== Conditional jump or move depends on uninitialised value(s)
==407113==    at 0x48674AA: secp256k1_scalar_is_zero (secp256k1.c:521)
==407113==    by 0x48674AA: secp256k1_scalar_set_b32_seckey (scalar_impl.h:61)
==407113==    by 0x48674AA: secp256k1_scalar_cmov (secp256k1.c:496)
==407113==    by 0x48674AA: secp256k1_ecdsa_sign_inner (secp256k1.c:488)
==407113==    by 0x48670BE: secp256k1_ecdsa_sign_recoverable (main_impl.h:132)
==407113==    by 0x401821: main (valgrind_ctime_test.c:81)

Your volatile tricks do not fool ICC.

Originally posted by @gmaxwell in https://github.com/bitcoin-core/secp256k1/pull/772#issuecomment-664812966

real-or-random commented 4 years ago

As discussed in #772, we could (probably) use the same volatile trick as in secp256k1_int_cmov or in memczero but I'm not sure if this hack is nice.
These tree functions are the ones we use only to overwrite outputs in case of failure with zeros, so we could also just use memczero for all of theses uses, which simplifies our code but probably makes it somewhat slower.
We can also say that we don't care about ICC but then I believe clang will figure this out in the future too, as it has done with the other uses.
This is a good chance to look into https://github.com/bitcoin-core/secp256k1/pull/457#issuecomment-604446374. However, I don't know if this is a good idea. This PR was about secp256k1_fe_cmov_limbs, which is one of the performance-critical cmovs, and inserting volatile there may hurt performance. I guess we should test this.
We could look into other patterns that don't use the masks but I don't think this will better, and it conflicts with the previous bullet.
Independently of this, we could also try to write ASM cmov functions at least for x86 (which use CMOV instructions). This should be faster anyway.

The first two options are ok of this is ok as long as compilers don't start to play around with the cmov functions that we really use in the arithmetic routines.

At the moment, I tend to believe that we should keep the three functions, i.e., add the volatile to secp256k1_scalar_cmov. As a second step, we could try to benchmark https://github.com/bitcoin-core/secp256k1/pull/457#issuecomment-604446374 for the other functions. If it's ok to change this in the performance-critical functions (I doubt it), then we should also do it here. Otherwise, ASM cmovs are interesting.

gmaxwell commented 4 years ago

The only reservation I have about the volatile trick is that there is a long history of compilers emitting broken code in the presence of volatile because almost nothing uses it. This has improved in recent times (like post GCC 7-ish) Otherwise it seems pretty reasonable and clean too. I didn't raise a concern about this because the particular usage where the volatile variable is assigned once then read from with nothing too interesting going on (no complex control flow, no scopes, etc.) is probably a usage that is least likely to expose bugs. It's my unverified belief that the volatile should have ~no performance impact, or at least we should be able to make a version that does.

MISRA 2012 Rule 13.2 "The value of an expression and its persistent side effects shall be the same under all permitted evaluation orders". It demands that you make at most one read and at most one write. The existing workaround doesn't violate the spirit of the rule, but the issue could be avoided by making a volatile sandwich-- copy the input into a volatile and back out again-- if there is a performance loss from doing the work around (e.g. causing a memory read for each moved word in the cmov), I bet the volatile sandwich would fix it.

I dubious of other approaches. If a compiler can't be trusted to leave simple bit operations alone all bets are off. E.g. cmovs using multiplies are at greater risk from compilers aggressively trying to eliminate multiplies (which I think was a factor in the ecdh u_last issue).

x86/x86_64 assembly is pretty maintainable, but as you've probably noticed w/ arm ... it's not always so easy. I think we ought to try for something in C if we're at all able. ASM would let us be more sure and maybe faster-- where its supported.

real-or-random commented 4 years ago

MISRA 2012 Rule 13.2 "The value of an expression and its persistent side effects shall be the same under all permitted evaluation orders". It demands that you make at most one read and at most one write. The existing workaround doesn't violate the spirit of the rule, but the issue could be avoided by making a volatile sandwich-- copy the input into a volatile and back out again-- if there is a performance loss from doing the work around (e.g. causing a memory read for each moved word in the cmov), I bet the volatile sandwich would fix it.

Sorry I can't follow. How is the existing workaround related to this rule?

How would a volatile sandwich avoid memory accesses? Wouldn't it even introduce a second memory access?

I agree with all you say otherwise.

gmaxwell commented 4 years ago

fun(int x) volatile int tmp1 = x; int tmp2 = tmp1;

then use tmp2. So there is only one read from tmp1 and it can make a register copy of it.

real-or-random commented 4 years ago

fun(int x) volatile int tmp1 = x; int tmp2 = tmp1;

then use tmp2. So there is only one read from tmp1 and it can make a register copy of it.

We talked somewhere else and figured out that @gmaxwell was under the wrong impression that the current volatile code is not of this form. But it is of this form. So this is not an issue.

bitcoin-core / secp256k1

secp256k1_scalar_cmov not constant-time on ICC #777