side-channel: this implementation is vulnerable to timing analysis

Dustin-Ray commented 1 year ago

@otsmr Im going to work on a solution for this issue in this repo for now until we arrive at the appropriate solution. For context, here is your original post on the matter:

This implementation is referenced in this Wikipedia article. The Code is after the sentence “The Montgomery ladder approach computes the point multiplication in a fixed amount of time”. But this implementation has in my understanding a timing issue, leading to a possibly timing attack.

The vulnerable line is here.

for i in (0..=s.bits()).rev()

According to the docs of BigInt, the function bits will return the fewest bits necessary to express the BigInt, but without any leading zeros. Because of the missing leading zeros, there are fewer iterations in the loop when the number is smaller. If you are using for the secret scalar in for example ECDSA random numbers between 1 and 2^256 it is likely that this random numbers can have multiple leading zeros. When you now generate many signatures and measures the time you can detect, when there are multiple leading zeros are present. With this knowledge, you can then perform a Lattice ECDSA attack.

Here is a simple POC to prof the timing issue:

use e521::curve::e521::e521::sec_mul;
use num::BigInt;
use e521::curve::e521::e521::get_e521_gen_point;
use std::time::Instant;

fn main() {
    for i in 0..200  {
        let point = get_e521_gen_point(false);
        let s = BigInt::from(1) << i;
        let now = Instant::now();
        let _result = sec_mul(s, point);
        println!("{} needed {} micro seconds", i, now.elapsed().as_micros());
    }
}

And here are the first lines of the output:

0 needed 1338 micro seconds
1 needed 2582 micro seconds
2 needed 5478 micro seconds
3 needed 9848 micro seconds
4 needed 12429 micro seconds
5 needed 16816 micro seconds
6 needed 21067 micro seconds
7 needed 24940 micro seconds
8 needed 30308 micro seconds
9 needed 34518 micro seconds
10 needed 38180 micro seconds

With proposed solutions:

For a proper solution, you can look in 2 in section „4.6.2 Constant-time ladders“: „One fix is to always arrange for n to have a fixed top bit, for example by adding an appropriate multiple of the order of P.“ You can also find this solution in the fix for libgcrypt 3 where they had the same timing issue as in your implementation.

Dustin-Ray commented 1 year ago

[3]: https://github.com/gpg/libgcrypt/commit/b9577f7c89b4327edc09f2231bc8b31521102c79 performs the following update to the ECDSA algorithm:

/* Originally, ECDSA computation requires k where 0 < k < n.
 * Here, we add n (the order of curve), to keep k in a
 * range: n < k < 2*n, or, addming more n, keep k in a range:
 * 2*n < k < 3*n, so that timing difference of the EC
 * multiply operation can be small.  The result is same.
 */
mpi_add (k, k, skey->E.n);
if (!mpi_test_bit (k, qbits))
  mpi_add (k, k, skey->E.n);

An equivalent update to the library as I understand it would involve a change not to the multiplication algorithm itself but to the ECDSA operation. The following change to keypair generation in src/ops.rs would be somewhat analagous:

pub fn new(pw: &Vec<u8>, owner: String, curve: EdCurves, d: u64) -> KeyPair {

    let s: Integer = (bytes_to_big(kmac_xof(&mut pw.to_owned(), &vec![], 512, "K", d)) * 4)
        % order(SELECTED_CURVE)
        + curve_n(SELECTED_CURVE);     // add curve order here to ensure top bit is set
    ...
    ...
}

Another change I would like to propose is to the wikipedia article itself by removing any reference of the term "constant-time" and replacing it with "fixed number of operations", as the second term is less misleading and more accurately reflects the purpose of the ladder.

The last major note on this particular attack comes from that of key sizes. This attack is most effective when variable key sizes are allowed in the scheme. Adding the curve order to the key seems to resolve the timing issue outright:

#[test]
fn test_sig_timing_side_channel() {

    for i in 0..10 {
        let mut msg = Message::new(&mut get_random_bytes(16));
        let pw = get_random_bytes(1 << i);
        let mut key_pair = KeyPair::new(&pw, "test key".to_string(), E448, 512);

        let now = Instant::now();
        msg.sign(&mut key_pair, 512);
        println!("{} needed {} microseconds", i, now.elapsed().as_micros());
        msg.verify(&key_pair.pub_key, 512);
        assert!(msg.op_result.unwrap());
    }
}

the timing might become much harder to measure with this change made:

0 needed 77745 microseconds
1 needed 78015 microseconds
2 needed 78592 microseconds
3 needed 77694 microseconds
4 needed 77426 microseconds
5 needed 77748 microseconds
6 needed 76639 microseconds
7 needed 77729 microseconds
8 needed 77696 microseconds
9 needed 78618 microseconds

Besides the update to this library, we should also update the language on the wikipedia page to be clear about key sizes and timing vulnerabilities.

Ill emphasize again though that this affects the ECDSA algorithm itself and not necessarily the ladder itself.

Dustin-Ray commented 1 year ago

Come to think of it, the proposed update above might not even be needed. Multiplying by 4 (curves in Edwards form have a number of points that is always a multiple of 4) ensures that the top bit is set., Without adding the curve order, the Schnorr signature algorithm still proceeds without an immediately noticeable effect on runtime:

let s: Integer =
    (bytes_to_big(kmac_xof(pw, &vec![], 512, "K", d)) * 4) % order(SELECTED_CURVE);

results with:

running 1 test
0 needed 78662 microseconds
1 needed 79512 microseconds
2 needed 79987 microseconds
3 needed 78335 microseconds
4 needed 77569 microseconds
5 needed 76647 microseconds
6 needed 77613 microseconds
7 needed 78308 microseconds
8 needed 77308 microseconds
9 needed 77590 microseconds

I think our implementation of ECDSA/Schnorr is resistant to this attack. Please correct me if I am mistaken. The ladder remains unmodified, but I anticipate that removing the rug library in favor of crypto_bigint will confer side channel resistance to large parts of the library.

Dustin-Ray commented 11 months ago

Hey @otsmr I think we've got a much better solution for this in place. The upgraded curve can be found on this branch, I believe the test is now showing fixed-time execution but I would appreciate your input. We followed curve-dalek mostly and chose to implement variable-base multiplication with a fixed-time lookup table in extended coordinate representation, vs the montgomery ladder with affine coordinates. This results in some substantial improvements in performance as well:

The improvement in signing time for 16 bytes with varying small key sizes went from 77,745 microseconds to 403 microseconds, this means the new approach is nearly 193 times faster than the previous one.
Several new fixed-time operations were added to ensure selection of points from the table is also fixed-time.
The scalar backend is replaced with crypto-bigint, so all scalar operations are also fixed time.
Same is true for field element representation of coordinates, now using fiat-crypto backend for prime field operations.

If you feel good about how this looks now, Id like to propose we close this issue and update the elliptic curve point multiplication wikipedia page. Let me know what you think.

otsmr commented 11 months ago

Sounds good to me 👍

Dustin-Ray commented 11 months ago

resolved through overhaul of curve in #29

Dustin-Ray / capyCRYPT

side-channel: this implementation is vulnerable to timing analysis #30