Closed oconnor663 closed 9 months ago
curve25519-dalek implements fixed-based Montgomery multiplication in terms of Edwards multiplication:
Notably this allows it to leverage the precomputed basepoint tables for the Edwards form, so we don't need to maintain two different copies of the basepoint tables for the two different curve forms.
You're free to use either form for ECDH, though for interop purposes you may find it easier to use X25519.
Fascinating, thanks for the lightning fast reply.
Actually, I'm not sure we're talking about the same thing. What I'm trying to benchmark above is arbitrary/variable base scalarmult, via mul_clamped
, which I think bottoms out at mul_bits_be
(which by complete coincidence was introduced in the commit you linked to?). The calls to mul_base_clamped
aren't part of the benchmarking loop. Am I right to think that that does not go through the Edwards form?
Aah yes, sorry I misread your benchmark. mul_base_clamped
goes through mul_base
where indeed mul_clamped
goes through mul_bits_be
. So fixed-base multiplication converts to Edwards form, whereas variable-base multiplication uses the Montgomery ladder, in case that's unclear.
Yes that's what it looks like to me, which brings me back to the performance question: Isn't the Mongtomery ladder supposed to be faster than scalar multiplication on the Edwards curve? Is it weird that Edwards seems faster in this benchmark?
Edwards arithmetic in curve25519-dalek has multiple backends including highly optimized architecture-specific SIMD backends including ones for AVX2 and AVX-512, whereas the Montgomery backend is relatively simplistic.
Got it, that makes sense. And yeah, I do notice a decent speedup (~17µs instead of ~23) when I benchmark nightly and pick up the AVX-512 optimizations. Thanks again!
Apologies for filing an issue that's more of a question :) My very basic understanding of the performance relationship between Curve25519 and Edwards25519 is that the former is faster when you're scalarmult'ing an arbitrary point, like you do in the second half of Diffie-Hellman. But when I benchmark scalarmult in this library (i5-1145G7 Linux laptop), it looks like the Edwards curve is faster:
There's a big chance that my assumption going in was wrong, or that I'm not benchmarking what I think I'm benchmarking. Am I right that this is surprising? Does this suggest that doing Diffie-Hellman on Edwards25519 directly (not converting to the Mongtomery form first) would actually be faster than regular X25519?