Open jedisct1 opened 1 year ago
On M2 Pro:
u32
kyber512: 49431 encaps/s
kyber768: 46900 encaps/s
kyber1024: 40496 encaps/s
kyber512: 93939 decaps/s
kyber768: 66015 decaps/s
kyber1024: 48002 decaps/s
kyber512: 76156 keygen/s
kyber768: 43677 keygen/s
kyber1024: 26951 keygen/s
u64
kyber512: 49892 encaps/s
kyber768: 47831 encaps/s
kyber1024: 41748 encaps/s
kyber512: 91408 decaps/s
kyber768: 63460 decaps/s
kyber1024: 48021 decaps/s
kyber512: 72777 keygen/s
kyber768: 43475 keygen/s
kyber1024: 26621 keygen/s
u128
kyber512: 54565 encaps/s
kyber768: 47450 encaps/s
kyber1024: 41330 encaps/s
kyber512: 92113 decaps/s
kyber768: 63905 decaps/s
kyber1024: 46823 decaps/s
kyber512: 74247 keygen/s
kyber768: 43298 keygen/s
kyber1024: 26958 keygen/s
Cool that it's so easy to check this. It'd be quite a bit of work with the C reference implementation.
On an AMD Zen2 CPU:
https://github.com/bwesterb/kyber-zig/blob/da1ab9f5a9c5fce24dba5e7e165c86f51727252a/src/main.zig#L1139
u64
:u128
:u32
:u32
is actually faster, probably due to better opportunities for auto vectorization.