RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.75k stars 238 forks source link

soft-sha512 code size seems unreasonably high on thumbv7em #561

Open TomCrypto opened 5 months ago

TomCrypto commented 5 months ago

I'm working on an embedded project which needs ed25519 signing, which pulls in sha2 for the sha512 step of the signing procedure. On an thumbv7em target, the sha512 implementation appears to consume a very significant chunk of code space:

$ cargo bloat --release --filter sha2
File .text    Size Crate Name
0.9% 17.0% 26.8KiB  sha2 sha2::sha512::compress512
0.2%  4.4%  6.9KiB  sha2 sha2::sha256::compress256
0.0%  0.0%      0B       And 0 smaller methods. Use -n N to show more.
1.1% 21.4% 33.7KiB       filtered data size, the file size is 3.0MiB

If I use opt-level = "z" it is better, but still pretty high:

$ cargo bloat --release --filter sha2
File .text    Size Crate Name
0.5% 10.1% 10.8KiB  sha2 sha2::sha512::compress512
0.2%  3.0%  3.2KiB  sha2 sha2::sha256::compress256
0.0%  0.2%    250B  sha2 sha2::sha512::soft::sha512_schedule_x2
0.0%  0.2%    210B  sha2 sha2::sha512::soft::sha512_digest_round
0.0%  0.1%    162B  sha2 sha2::sha256::soft::schedule
0.0%  0.1%    162B  sha2 sha2::sha256::soft::sha256_digest_round_x2
0.0%  0.0%     32B  sha2 core::iter::adapters::zip::TrustedRandomAccessNoCoer...
0.0%  0.0%      0B       And 0 smaller methods. Use -n N to show more.
0.7% 13.8% 14.8KiB       filtered data size, the file size is 2.0MiB

The amounts above seem quite onerous for my target which only has 256kB of flash, and would potentially be a non-starter for targets with even less code storage available.

I suspect it's probably a combination of inlined soft 64-bit integer arithmetic and extreme levels of code generation due to macro expansion.

If improving the code size would be a performance regression on some platforms perhaps an implementation favoring code size gated behind a crate feature flag could be of interest? Like a "32-bit-friendly" version or something I guess.

newpavlov commented 5 months ago

547 may reduce size a bit, but the main reason for the big code size is likely aggressive inlining of round processing code which we use. Our block compressing function compiles down to a completely branchless code, which is usually what we want, but it's not desirable for constrained targets.

It may be worth introduce a no_unroll flag similar to one in the keccak crate. We will gladly accept such PR.