besser82 / libxcrypt

Extended crypt library for descrypt, md5crypt, bcrypt, and others
GNU Lesser General Public License v2.1
178 stars 49 forks source link

Update SHA-256 implementation from libcperciva #168

Closed dpward closed 1 year ago

dpward commented 1 year ago

The SHA-256 implementation is from Colin Percival. There seem to be a couple of different upstream sources for this:

It appears that libxcrypt derives SHA-256 from libcperciva. In particular it is missing this upstream bug fix: https://github.com/Tarsnap/libcperciva/commit/f3eceb448509b7baa1e4422cb84b7a753097a477

Should that be applied individually, or should it be synced with the current upstream version? (I'm not sure if the hardware acceleration support is necessary.)

solardiz commented 1 year ago

libxcrypt derives SHA-256 from libcperciva. In particular it is missing this upstream bug fix: Tarsnap/libcperciva@f3eceb4

Thank you for spotting this! I think we should apply that fix on its own. I think this wrapper function is unused in libxcrypt itself (and not exported, or is it?), but since we have it of course we should make it zeroize the state like other Final functions do.

The same issue exists in my upstream yescrypt codebase (where that function is also unused) - I'll do similar there.

HMAC and PBKDF2 on SHA-256 are used by (ye)scrypt - just not the HMAC_SHA256_Final wrapper function.

As to sync'ing with libcperciva, I think we should contribute to there the "Implemented a little-known SHA-2 Maj() optimization proposed by Wei Dai" commit. I'm not sure about us importing the hardware-specific optimizations back from there - this would mostly speed up sha256crypt, which might not be popular enough to justify the substantial added complexity and risk of bugs. (sha512crypt is far more popular.)

As to usage by (ye)scrypt, SHA-256 is something under 1% of its total running time and, if we cared, the best hardware-specific optimization would be different from what libcperciva has - namely, it is possible to compute multiple PBKDF2 output blocks concurrently, like e.g. Litecoin miners did (for them this was more relevant due to low scrypt settings and thus higher portion of time spent on SHA-256, and also the different performance vs. complexity trade-off when it's directly about money).