Closed korken89 closed 8 months ago
Personally, I do not like such "transparent" features. You can easily enable it in your project by adding this line to your project's Cargo.toml:
ascon = { version = "*", features = ["no_unroll"] }
Note that we have similar feature in keccak
, but not in the sha3
crate. In future we also may migrate from crate features to configuration flags fro this kind of functionality.
To summarize: I am inclined to close this PR and its AEAD counterpart.
FWIW I'm okay with it. It seems like something we should make easy given Ascon primarily targets embedded devices
Maybe we then should make manual unrolling gated by a feature or configuration flag and keep the unrolled version as the default?
Sounds good to me (i.e. switching to an unroll
feature).
@sebastinas WDYT?
I'd rather not. I have no data to check if the no_unroll
version produces compact code or not. I should add a big note that the feature is currently best effort basis waiting for feedback of people that actually use the code on embedded platforms.
@sebastinas I'm testing ascon
right now for some embedded (ARM thumbv7m) use cases, I can report back on size differences.
Hi, I've tested the differences again and I'm quite sure we can close this. Results:
Without ascon = { version = "*", features = ["no_unroll"] }
0.0% 0.7% 440B ascon ascon::round
0.0% 0.6% 332B rpc_testing rpc_testing::bsp::ascon_mac <-- (callsite)
0.0% 0.3% 186B ascon ascon::State::permute_12
Total use: 626 bytes for ascon
, 332 bytes for callsite, 958 total.
With ascon = { version = "*", features = ["no_unroll"] }
0.0% 0.9% 554B ascon ascon::State::permute_12
0.0% 0.6% 366B rpc_testing rpc_testing::bsp::ascon_mac <-- (callsite)
Total use: 554 bytes for ascon
, 366 bytes for callsite, 920 bytes total.
This is the callsite for reference:
#[inline(never)]
pub fn ascon_mac(id: &[u8; 12]) -> [u8; 6] {
use ascon_hash::{AsconXof, ExtendableOutput, Update, XofReader};
let mut xof = AsconXof::default();
xof.update(id);
let mut reader = xof.finalize_xof();
let mut dst = [0u8; 6];
reader.read(&mut dst);
dst
}
Have you compiled it with s
or z
optimization level? The difference looks relatively small.
@newpavlov This is built with:
s
optimizationthumbv7em-none-eabi
targetrustc 1.76.0
As is very common in the embedded space we use indeed s
optimization.
I think I found what got me hunting this to start with. I was benchmarking ChaCha8Poly1305
vs ascon_aead::Ascon
and for some reason when switching between using no_unroll
or not be firmware grew with ~2kB.
This led me to believe ascon
was the culprit, but checking the cargo bloat
output I'm seeing the same size as we see here the the permutation.
I need to investigate more, but I think we can safely say that ascon
is not the issue. More likely I'm hitting something causing optimization to do something unexpected.
This helps when used on resource constrained systems, e.g. MCUs, to keep the binary size small.