OpenMined / sycret

Function Secret Sharing library for Python and Rust with hardware acceleration
https://openmined.github.io/sycret/
Apache License 2.0
54 stars 9 forks source link

Replace `aesni` crate with `aes` #26

Closed nph4rd closed 3 years ago

nph4rd commented 3 years ago

Description

Since the aesni crate restricts the supported backends, this was causing some problems related to the usage of SyMPC on aarch64 architectures (See https://github.com/OpenMined/SyMPC/issues/286). Replacing this crate with aes will help because, by default, "it performs runtime detection of CPU intrinsics and uses them if they are available".

Also, I removed the flags in the Cargo config:

Closes #24

Affected Dependencies

aesni -> aes

How has this been tested?

Checklist

tholop commented 3 years ago

Nice work Arturo! And great if we can simplify the crate and the compilation options at the same time.

Please, do you have a benchmark that shows that the aes crate is indeed as fast as the aesni crate when the system supports AES-NI? Especially after preparing the package with Maturin.

(I think that we have a pending branch/issue for some automated benchmarks with Criterion but I never finished it)

nph4rd commented 3 years ago

Currently we only have what's in sycret/benches/. The output is as follows:

With the changes:

Eq keygen               time:   [8.8554 us 8.8628 us 8.8739 us]                       
                        change: [-1.5175% -0.5167% +0.3938%] (p = 0.31 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Without the changes:

Eq keygen               time:   [6.1032 us 6.1108 us 6.1208 us]                       
                        change: [-31.143% -30.960% -30.808%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

So if I understand correctly, there is a slight decrease in performance? I do think it would be nice to have a more robust benchmark for comparison though.

Btw... Ran these with:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               142
Model name:          Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Stepping:            12
CPU MHz:             800.007
CPU max MHz:         4600.0000
CPU min MHz:         400.0000
BogoMIPS:            3984.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
tholop commented 3 years ago

Indeed, thanks for the results. I'll try to add more benches to see if the performance drop is acceptable (it's a good time to do that, since we need benchmarks anyway). Let's keep the PR open for now!

nph4rd commented 3 years ago

Ok! Agreed :+1:

nph4rd commented 3 years ago

Force-pushed to rebase with master and trigger benchmarks.

nph4rd commented 3 years ago

@tholop what do you think?

I've rebased this branch so that it runs the benchmarks. It seems to me that performance is only very slightly (insignificantly?) affected by the change. You can compare the results yourself:

master vs PR

tholop commented 3 years ago

Thank you, Arturo! I double-checked the benchmarks on my laptop too, and indeed there is no significant performance hit. I'm merging this, thanks for the PR and for the issues discovered and solved on the way :)