ctz / fastpbkdf2

Fast PBKDF2 implementation in C
Creative Commons Zero v1.0 Universal
158 stars 46 forks source link

fastpbkdf2

This is a fast PBKDF2-HMAC-{SHA1,SHA256,SHA512} implementation in C.

It uses OpenSSL's hash functions, but out-performs OpenSSL's own PBKDF2 thanks to various optimisations in the inner loop.

Build Status

Interface

void fastpbkdf2_hmac_sha1(const uint8_t *pw, size_t npw,
                          const uint8_t *salt, size_t nsalt,
                          uint32_t iterations,
                          uint8_t *out, size_t nout);

void fastpbkdf2_hmac_sha256(const uint8_t *pw, size_t npw,
                            const uint8_t *salt, size_t nsalt,
                            uint32_t iterations,
                            uint8_t *out, size_t nout);

void fastpbkdf2_hmac_sha512(const uint8_t *pw, size_t npw,
                            const uint8_t *salt, size_t nsalt,
                            uint32_t iterations,
                            uint8_t *out, size_t nout);

Please see the header file for details and constraints.

Performance

These values are wall time, output from the bench tool.

AMD64

Hash OpenSSL fastpbkdf2 (comparison)
SHA1 11.84s 3.07s x3.86
SHA256 16.54s 7.45s x2.22
SHA512 21.90s 9.33s x2.34

222 iterations, 1.86GHz Intel Atom N2800, amd64.

ARM

Hash OpenSSL fastpbkdf2 (comparison)
SHA1 30.4s 4.43s x6.86
SHA256 36.52s 7.04s x5.19
SHA512 77.44s 28.1s x2.76

220 iterations, Raspberry Pi - 700MHz ARM11.

Requirements

Building and testing

Run 'make test' to build and run tests.

The program bench provides a very basic performance comparison between OpenSSL and fastpbkdf2.

The implementation has one header and one translation unit. This is intended for easy integration into your project.

Optional parallelisation of outer loop

PBKDF2 is misdesigned and you should avoid asking for more than your hash function's output length. In other words, nout should be <= 20 for fastpbkdf2_hmac_sha1, <= 32 for fastpbkdf2_hmac_sha256 and <= 64 for fastpbkdf2_hmac_sha512.

If you can't avoid this (for compatibility reasons, say) compile everything with -fopenmp and -DWITH_OPENMP to have this computation done in parallel. Note that this has non-zero overhead.

The program multibench provides a basic performance comparison for using this option.

Windows

Details on building for Windows.

License

CC0.

Language bindings

Author

Joseph Birr-Pixton jpixton@gmail.com