RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.75k stars 238 forks source link

Support `no_unroll` for Ascon #567

Closed korken89 closed 3 months ago

korken89 commented 4 months ago

This helps when used on resource constrained systems, e.g. MCUs, to keep the binary size small.

newpavlov commented 4 months ago

Personally, I do not like such "transparent" features. You can easily enable it in your project by adding this line to your project's Cargo.toml:

ascon = { version = "*", features = ["no_unroll"] }

Note that we have similar feature in keccak, but not in the sha3 crate. In future we also may migrate from crate features to configuration flags fro this kind of functionality.

To summarize: I am inclined to close this PR and its AEAD counterpart.

tarcieri commented 4 months ago

FWIW I'm okay with it. It seems like something we should make easy given Ascon primarily targets embedded devices

newpavlov commented 4 months ago

Maybe we then should make manual unrolling gated by a feature or configuration flag and keep the unrolled version as the default?

tarcieri commented 4 months ago

Sounds good to me (i.e. switching to an unroll feature).

@sebastinas WDYT?

sebastinas commented 4 months ago

I'd rather not. I have no data to check if the no_unroll version produces compact code or not. I should add a big note that the feature is currently best effort basis waiting for feedback of people that actually use the code on embedded platforms.

korken89 commented 4 months ago

@sebastinas I'm testing ascon right now for some embedded (ARM thumbv7m) use cases, I can report back on size differences.

korken89 commented 4 months ago

Hi, I've tested the differences again and I'm quite sure we can close this. Results:

Without ascon = { version = "*", features = ["no_unroll"] }

0.0%   0.7%    440B         ascon ascon::round
0.0%   0.6%    332B   rpc_testing rpc_testing::bsp::ascon_mac <-- (callsite)
0.0%   0.3%    186B         ascon ascon::State::permute_12

Total use: 626 bytes for ascon, 332 bytes for callsite, 958 total.

With ascon = { version = "*", features = ["no_unroll"] }

0.0%   0.9%    554B         ascon ascon::State::permute_12
0.0%   0.6%    366B   rpc_testing rpc_testing::bsp::ascon_mac <-- (callsite)

Total use: 554 bytes for ascon, 366 bytes for callsite, 920 bytes total.

This is the callsite for reference:

#[inline(never)]
pub fn ascon_mac(id: &[u8; 12]) -> [u8; 6] {
    use ascon_hash::{AsconXof, ExtendableOutput, Update, XofReader};

    let mut xof = AsconXof::default();
    xof.update(id);
    let mut reader = xof.finalize_xof();
    let mut dst = [0u8; 6];
    reader.read(&mut dst);
    dst
}
newpavlov commented 4 months ago

Have you compiled it with s or z optimization level? The difference looks relatively small.

korken89 commented 4 months ago

@newpavlov This is built with:

As is very common in the embedded space we use indeed s optimization.


I think I found what got me hunting this to start with. I was benchmarking ChaCha8Poly1305 vs ascon_aead::Ascon and for some reason when switching between using no_unroll or not be firmware grew with ~2kB. This led me to believe ascon was the culprit, but checking the cargo bloat output I'm seeing the same size as we see here the the permutation. I need to investigate more, but I think we can safely say that ascon is not the issue. More likely I'm hitting something causing optimization to do something unexpected.