Closed tarcieri closed 1 month ago
Aside from a similar change to the Boxed*
implementation, I think this is the last bit of timing variability needed to address #627
Note: there's an alternative strategy for computing the upper bounds here which we should investigate https://github.com/sipa/safegcd-bounds
I wonder if an implementation of gcd_vartime
that just uses rem_vartime
would be faster or not.
I should probably add benchmarks for gcd_vartime
as well
Added some benchmarks in f2a5aed.
For reference:
greatest common divisor/gcd, U256
time: [40.901 µs 41.114 µs 41.391 µs]
greatest common divisor/gcd_vartime, U256
time: [1.5651 µs 1.5714 µs 1.5796 µs]
wrapping ops/div/rem_vartime, U256/U128, full size
time: [87.533 ns 87.881 ns 88.262 ns]
wrapping ops/rem_vartime, U256/U128, full size
time: [87.768 ns 88.654 ns 90.059 ns]
@fjarri I guess the idea would be to use rem_vartime
to implement Euclid's method? But what's interesting about a gcd_vartime
or inv_mod_vartime
using Bernstein-Yang is it's constant time with respect to f
(but not to g
). I'm not sure you can implement Euclid's algorithm in such a manner.
That's true, but an implementation vartime in both arguments is useful too, so I wonder if removing the restriction on the first argument leads to a noticeable performance gain.
@fjarri I guess we could potentially have all three options, but I'm not sure about naming.
One question I'd have is what is the use case for a fully variable time GCD in cryptographic algorithms.
I tried out a Euclidean algorithm implementation with rem_vartime()
, and it seems to be significantly slower than the current gcd_vartime()
(about 10x).
The previous implementation runs in variable-time with respect to
g
. However in the event both inputs are secret a fully constant-time implementation is required.This implements the method described in section 11 of https://eprint.iacr.org/2019/266.pdf and more specifically this Python code from Figure 11.1:
Instead of bounding the loop on
g
reaching zero, this instead computes a fixed number of iterations relative to the highest bit of eitherf
org
after which the algorithm will converge, then runs for that number of iterations instead.This results in about a 22X performance impact:
The previous implementation which is variable-time with respect to
g
is preserved as well, for now asgcd_vartime
, but it would also be nice to add aninv_mod_vartime
as well.