etcimon / botan

Block & stream ciphers, public key crypto, hashing, KDF, MAC, PKCS, TLS, ASN.1, BER/DER, etc.
Other
85 stars 22 forks source link

Performance vs openssl #7

Closed tchaloupka closed 8 years ago

tchaloupka commented 8 years ago

Out of curiosity I tried simple benchmark of JWTD library (https://github.com/chalucha/jwtd/tree/benchmark/benchmark)

With dmd-2.068 it resulted in: dub -c openssl -b release

JWT None:       64 ms, 465 μs, and 3 hnsecs
JWT HS256:      95 ms and 851 μs
JWT RS256:      16 secs, 60 ms, 674 μs, and 6 hnsecs
JWT ES256:      4 secs, 461 ms, 714 μs, and 7 hnsecs

dub -c botan -b release

JWT None:       63 ms, 50 μs, and 8 hnsecs
JWT HS256:      255 ms and 901 μs
JWT RS256:      12 minutes, 46 secs, 316 ms, 390 μs, and 6 hnsecs
JWT ES256:      47 secs, 519 ms, 877 μs, and 3 hnsecs

JWT None is not using openssl neither botan, so it's the same. There is a huge difference (48x) with RS256.

I know that DMD is bad for any benchmarks, but unfortunatelly it does not build for me with any of: GDC (Gentoo 4.8.4 p1.6, pie-0.6.1) 4.8.4 LDC - the LLVM D compiler (0.15.1) based on DMD v2.066.1 and LLVM 3.6.0

etcimon commented 8 years ago

I've been pushing this off for a little time now, but it would require some ASM optimizations: https://github.com/randombit/botan/blob/master/src/lib/math/mp/mp_x86_64/mp_asmi.h

Perf gives me the following hot spots: 23.11% benchmark benchmark [.] _D5botan4math2mp7mp_core8word_addFNammPmZm 16.26% benchmark benchmark [.] _D5botan4math2mp7mp_core10word_madd3FNammmPmZm 9.91% benchmark benchmark [.] _D5botan4math2mp7mp_core10word_madd2FNammPmZm

I'll try and merge a fix for this soon

etcimon commented 8 years ago

I think LDC is going to be needed here. I'll invest some time on compiling with that instead.

tchaloupka commented 8 years ago

Yep, I would also not bother much with dmd on this and it would be nice to have some numbers from LDC or GDC.

etcimon commented 8 years ago

As soon as 2.067 is supported in LDC the plan was to add support. I can't use asm pure nothrow with the current version.

I'm guessing the optimizations from LLVM will close the gap on this benchmark, dmd has a lot of known codegen missing features and 46x is reasonable given the complexity of these algorithms and the opportunities that other compilers can use.

etcimon commented 8 years ago

I added an openssl engine that pipes all Big number operations through OpenSSL, and also added LDC support, and it's still 5-6x slower for RS256 down from 46x. I'm going to improve it towards the point where it pipes public key operations directly through the high-level openssl functions like RSA_sign. I think it's going to be hard to beat OpenSSL in terms of manually tweaking the performance for these because LDC doesn't have manual inlining yet so there's a lot of overhead that can't be eliminated

etcimon commented 8 years ago

I decided to put this through perf and apparently the problem was with loadKey doing a lot of checks. Putting the private key in a static variable reduces the difference with openssl to about 3.5x on x86_64, which I deem more acceptable. However, I will add encryption/decryption/signing/verification engines for high-level crypto objects in the openssl engine to make up for the performance gap when it is needed.