esp-rs / esp-mbedtls

mbedtls for ESP32 bare-metal
Apache License 2.0
17 stars 7 forks source link

feat(bignum): Add initial hardware acceleration for modular exponentiation #24

Closed AnthonyGrondin closed 6 months ago

AnthonyGrondin commented 7 months ago

This is based on #20, where I isolated the modular exponentiation part to break down the PR into smaller chunks.

This uses the hardware RSA accelerator to use hardware acceleration for bignum operations.

Support: example esp32 esp32c3 esp32s2 esp32s3
sync_client
sync_client_mTLS HO HO HO
async_client DNC
async_client_mTLS HO HO HO
sync_server
sync_server_mTLS HO
async_server DNC
async_server_mTLS DNC

DNC: Did not compile. region dram_seg overflowed by X bytes HO: Heap Overflow. The request ran out of heap space. — : No support. Currently, I did not add support for the esp32 as it acts differently.

@bjoernQ I'm pinging you since you're most likely going to be the one doing a review on this. Feel free to tear down this code to make it safer / faster. The match num_words {} section could be done using a macro since it's repetition. I still don't know what's the optimal way to implement this. We cannot use generics since the implementation in the HAL uses structs to define the size of the operands.

bjoernQ commented 7 months ago

I think this already looks quite good

It's quite annoying, that we cannot reduce that match num_words {} part because of the structs. Probably the part before let mut mod_exp could be generalized but the code would get harder to read most likely. Macros are probably the better approach then - or live with the repetition, it's 5 or 6 occurrences so probably fine.

For safety Session::new should take the RSA peripheral to make it clear it's used and cannot be used otherwise (in a safe way)

Probably we can get rid of a few copy_bytes by transmuting the raw-pointers - something like this

U256::LIMBS => {
                let mut mod_exp = RsaModularExponentiation::<operand_sizes::Op256>::new(
                    &mut rsa,
                    core::mem::transmute(Y.private_p),         // exponent (Y) Y_MEM
                    core::mem::transmute(M.private_p),          // modulus (M)  M_MEM
                    compute_mprime(M), // mprime
                );
                mod_exp.start_exponentiation(
                    core::mem::transmute(X.private_p), // X_MEM
                    core::mem::transmute(rinv.private_p),    // Z_MEM
                );

                mod_exp.read_results(core::mem::transmute((*Z).private_p));
}

It's scary and unsafe but not really more unsafe than copying the bytes I guess

AnthonyGrondin commented 6 months ago

I rebased this off master. I added an argument for Session::new() to optionally take in the RSA peripheral. This can be chosen at runtime, which is neat.

The match num_words {} part isn't so bad here because we only have a few branches, but it quickly becomes a mess when doing the multiplication part, since every size has to be implemented.

I tried switching over to memory transmuting, and it doesn't seem to work. I couldn't get the self-tests to pass.

AnthonyGrondin commented 6 months ago

I changed the API to mutably borrow an instance of the RSA driver instead of the peripheral. This allows to fix a borrow issue when running in loop. I also added some documentation for Session::new() to clarify the usage.

EDIT: closes #12 #23