Code Size: baremetal libmbedcrypto: 2835: PK dispatch

tom-cosgrove-arm commented 1 year ago

Bring in the changes from #2835

Expected saving: Text: 4282, Data: -28

tom-daubney-arm commented 1 year ago

Will be holding off from working on this issue for a few weeks since this area of the code is currently undergoing extensive modifications in another work stream.

mpg commented 1 year ago

Note: we've been making quite a lot of changes to PK recently as part of driver-only work. In order to avoid conflicts, I recommend waiting until https://github.com/Mbed-TLS/mbedtls/pull/7814 has been merged. After that, things should be quiet on our side.

mpg commented 1 year ago

On another front, I've been thinking for some time (sorry for not posting earlier, need time to organize my thoughts) that we need some investigation and design to happen before we tackle this.

I don't think referring to the baremetal PR is a good enough definition of done. For example, the baremetal version had the new option MBEDTLS_PK_SINGLE_TYPE depend on MBEDTLS_USE_TINYCRYPT for a number of reasons - but we don't have such an option upstream. Instead, we have other options like USE_PSA (absent from baremetal) and recently the ability to have support for ECC keys in PK without ECP_C. What use case(s) are we trying to support now? What's the minimal deliverable that would support that?
The original baremetal PR is quite large by our current standards (18 commits, +1600 - 1100 lines). Is there any sensible way to break this up into small tasks? (The original PR's description says it's subdivided into 3 series of commits, so that's an obvious division if we go this way, but see next point.)
Are we sure the architecture used in baremetal, which relies quite heavily on the pre-processor, is the right one for something we're going to have to maintain in the long run? More on that below.

Currently, PK relies on function pointers held in a pk_info structure that contain wrappers for the various operations. Historically, all three abstraction layers (MD and Cipher too) were made this way. However, in the meantime, we moved MD to use plain old switch/case statements instead, and recently @gilles-peskine-arm has suggested we do the same for PK. This looks like a sensible suggestion to me, and it looks like it could bring significant code size improvements with much fewer complexity than the approach taken in the baremetal branch. (If each case is properly guarded, when only one PK type is supported, the compiler can know which case will be taken and eliminate the switch and possibly some function calls as well if we make enough things static in the same file, as we had to do in baremetal as well.)

We could take this one step further than we did in MD, and in PK eliminate dynamic allocation of the underlying context as well (like we also did in the baremetal branch). This could actually be done in all cases (not just when a single PK type is hardcoded) by using a union, again with enough guards that the compiler can see when only one case in the union is possible. (Note that as part of driver-only ECC work, in some cases we already avoid dynamic allocation of key storage.)

I think this all deserves some thinking before the present task is ready to be executed.

gilles-peskine-arm commented 1 year ago

we moved MD to use plain old switch/case statements instead, and recently @gilles-peskine-arm has suggested we do the same for PK. This looks like a sensible suggestion to me, and it looks like it could bring significant code size improvements with much fewer complexity than the approach taken in the baremetal branch.

@daverodgman has tried this for cipher and it was worse than using structures. But maybe it's only worse if you do it for a single function and it's different if you do it throughout so you can almost get rid of the structures.

daverodgman commented 1 year ago

That comment was based on a config that included a lot of stuff. It might not hold if you were building a minimal config, so I wouldn't completely rule this out.

gilles-peskine-arm commented 1 year ago

https://github.com/ARMmbed/mbed-crypto/pull/185 did save some code size in a build with all algorithms: md (+ md_wrap before) went from 2348 bytes down to 1971.

mpg commented 1 year ago

I think the primary purpose of this task can be stated as follows: in a configuration where a single PK type is used (namely MBEDTLS_PK_ECKEY I guess), how can we make the overhead of PK as low as possible?

I think the baremetal PR goes pretty low in this case:

memory overhead is exactly zero, as mbedtls_pk_context has the exact same memory representation as the underlying context (a TinyCrypt/µECC context there);
code size overhead is vastly reduced as the numbers show - pk_sign() for example just calls pk_sign_restartable() which has no further dispatch and directly compiles down to the appropriate wrapper (in this case uecc_eckey_sign_wrap).

However there as IMO a few downsides to the design used there:

it's all or nothing: either you hardcode a single key type and get a log of gains, or you need at least two key types and you get nothing;
I find it quite pre-processor heavy which might be a concern for long-term maintainability.

If we can get similar gains with a different design, that might be worth considering. It looks to me like using switch statements instead of pk_info structures with function pointers while re-using some of the tricks used in the baremetal branch (no dynamic allocation of the underlying context, pk_wrap.c moved to pk.c so that the compiler can inline at will, etc.) could be a way to do so.

However, the trick with code size is it's very hard to predict gains before we try.

gilles-peskine-arm commented 1 year ago

in a configuration where a single PK type is used (namely MBEDTLS_PK_ECKEY I guess)

Is this even plausible, given the subtilities around MBEDTLS_PK_ECKEY vs MBEDTLS_PK_ECDSA?

mpg commented 1 year ago

The baremetal branch went with a new config option MBEDTLS_PK_SINGLE_TYPE (that you have to define to MBEDTLS_PK_INFO_ECKEY to indicate that MBEDTLS_PK_ECKEY is the only type you want).

PK_ECKEY is the "generic" PK ECC key type that can do all the operations that PK_ECDSA can do (and that PK_ECKEY_DH can do as well), so it's the only one you need. (In retrospect, the other ones were mistakes, the right way is what PSA did of course: single key type, then attributes - but here we are.)

But I don't think there's a way to get only that one without introducing some new config option to say "I only want that one" somehow.

mpg commented 1 year ago

Note: I would encourage people interested in this discussion to have a quick look at the baremetal PR https://github.com/Mbed-TLS/mbedtls/pull/2835: reading only the description and commit messages should give you a good idea what was done.

tom-cosgrove-arm commented 1 year ago

Hmm, how dependent is that on MBEDTLS_USE_TINYCRYPT?

mpg commented 1 year ago

AFAICS it's only dependent on MBEDTLS_USE_TINYCRYPT for practical details, but the overall strategy should not actually depend on it.

Mostly we were doing the minimum to support a very specific use case, and since that use case had MBEDTLS_USE_TINYCRYPT we explicitly didn't make any effort to support anything else.

Obviously the situation is different now, in that even if we have a particular user in mind to guide priorities, it's a different use case, and also I think we're trying to improve the situation not just for that use case.

Hence questions like (1) does this need to work only with USE_PSA, with !USE_PSA, both? (2) how important is it to support restartable (was completely omitted from baremetal), (3) what about driver-only builds?

Some on these questions may turn out not to be important - perhaps the same design will work regardless of the answer - but I feel like they illustrate that just pointing to the baremetal PR is IMO not a precise enough definition of the task at hand. (It may be for other cases, but here the situation is different enough that I don't think it is.)

mpg commented 1 year ago

After discussion with @tom-cosgrove-arm I'm assigning this to myself as I'd like to prototype a few things - however this probably won't happen before Q4.

Mbed-TLS / mbedtls

Code Size: baremetal libmbedcrypto: 2835: PK dispatch #7375