Mbed-TLS / mbedtls

An open source, portable, easy to use, readable and flexible TLS library, and reference implementation of the PSA Cryptography API. Releases are on a varying cadence, typically around 3 - 6 months between releases.
https://www.trustedfirmware.org/projects/mbed-tls/
Other
5.07k stars 2.52k forks source link

Identify improvements in baremetal branch that should be upstreamed #3535

Open mpg opened 3 years ago

mpg commented 3 years ago

The baremetal branch of Mbed TLS includes changes, features and options that allow to reduce the size of the code and/or the RAM usage of the library. Most of those would be of general interest to people who use Mbed TLS in highly constrained environments and would be suitable for upstreaming; while some of them are either too specific, or would conflict with or be made redundant by future development (such as the move the PSA Crypto).

This task is to:

This task does not include doing any of the actual side-porting/upstreaming - this work will need to be planned later from the above epics.

mpg commented 3 years ago

I used this to list all PRs that have been merged in the baremetal branch:

git log --reverse --merges --format='%s' mbedtls-2.16..baremetal |
    sed '/mbedtls-2.16. into/d
        /baremetal. into/d
        s/.* pull request #\([0-9]*\) from .*/\1/g
        s/.* remote-tracking branch [^ ]*\/\([0-9]*\). into .*/\1/
        s#^[0-9][0-9][0-9]$#https://github.com/ARMmbed/mbedtls-restricted/pull/&#
        s#^[0-9]*$#https://github.com/ARMmbed/mbedtls/pull/&#
        '

The output consists of 151 PRs. I did a first pass over that to annotate them with:

I'm attaching the resulting prs-baremetal.txt file for reference, but I don't suggest anyone reviews it, as this would be tedious and provide low value.

I'll report the results by category in follow-up comments, making use of the following script:

for want in NO DONE MAYBE YES; do
    printf "\n%s\n" $want
    sed -n "s/.*$want - //p" prs-baremetal.txt |
        sort | uniq -c | sort -rn |
        awk '{t += $1; print} END { print "total:", t }'
done
mpg commented 3 years ago

Rejected PRs

There are 67 PRs in the baremetal branch I'm suggesting to reject as out-of-scope or inapplicable, which fall in the following categories:

mpg commented 3 years ago

PRs that were already upstreamed

I've identified 28 PRs that don't need upstreaming, either because it was already done, or because the PR is actually a side-port from something that was first done upstream. For information, they fall in the following categories:

mpg commented 3 years ago

PRs that we may or may not want to upstream

There are 20 PRs here, that I'll group in 3 series.

1. Measurements: There are 16 PRs that relate directly or indirectly to measuring the size of Mbed TLS in the configuration of interest to the baremetal branch.

Reason(s) against upstreaming this: the config was designed with a specific use case in mind.

Reasons for upsteaming this include:

Foreseeable difficulties in upstreaming (or indeed not upstreaming) this: it's probably going to be hard to upsteam all PRs in this series at once, as later PRs in this series will depend on other code size PRs (introducing new config.h) options. Conversely, not upstreaming this series is likely to create issue when upstreaming some code size PRs that update configs/baremetal.h or rely on improvements in this series for testing.

Recommendation: I'm quite in favour of upstreaming this, but I'd like to know what others think.

List of PRs in this series: https://github.com/ARMmbed/mbedtls/pull/2524 https://github.com/ARMmbed/mbedtls-restricted/pull/587 https://github.com/ARMmbed/mbedtls-restricted/pull/588 https://github.com/ARMmbed/mbedtls-restricted/pull/595 https://github.com/ARMmbed/mbedtls-restricted/pull/607 https://github.com/ARMmbed/mbedtls-restricted/pull/626 https://github.com/ARMmbed/mbedtls-restricted/pull/632 https://github.com/ARMmbed/mbedtls-restricted/pull/643 https://github.com/ARMmbed/mbedtls-restricted/pull/646 https://github.com/ARMmbed/mbedtls-restricted/pull/655 https://github.com/ARMmbed/mbedtls/pull/2849 https://github.com/ARMmbed/mbedtls/pull/2867 https://github.com/ARMmbed/mbedtls/pull/2850 https://github.com/ARMmbed/mbedtls/pull/2892 https://github.com/ARMmbed/mbedtls/pull/2910 https://github.com/ARMmbed/mbedtls/pull/2978

2. Code size: reduce cost of the MD and PK layers when a single alg is available

Overview: make the MD layer zero-cost (in terms of code-size and runtime indirection) when a config.h option signals that a single hash is used. Concretely, if only SHA-256 is used, a call to mbedtls_md_xxx() from say X509 or TLS will be replaced by the equivalent call to mbedtls_sha256_xxx() at compile-time. Start in the same direction in the PK layer, but not going as far (call to mbedtls_pk_xxx() replaced by the xxx_wrap() function). See PR descriptions for more details.

Reasons for upstreaming: who doesn't want to save hundreds of bytes? (probably more than 1K if we complete the work in the PK layer (see PR description) and do something similar in the Cipher layer as well)

Reasons against upstreaming:

Recommendation: I'm inclined not to upstream this as-is, but perhaps study it to see if some of the ideas could be applied in the implementation of the PSA Crypto API without compromising other goals. (Note: the PK PR is probably easier to review, but the MD PR goes further, as explained in the PK PR's description.)

3. Code size: Make all of X.509 as single compilation unit

Overview: turn all of the X509 library into a single translation unit, by #includeing other .c files from it; make a lot more functions static.

Reasons for upstreaming: who doesn't want to save hundreds or bytes, especially by just letting the compiler to more work for us?

Reasons against upstreaming: the way it's done is not very clean and make cause issues for people importing Mbed TLS in another project (as demonstrated by the Mbed OS issue).

Recommendation: I don't think we should upstream as-is, but we should definitely looks for ways to achieve the same result in a cleaner way. The savings are quite significant (and making more functions static reduce our ABI surface as a bonus).

mpg commented 3 years ago

PRs suitable for upstreaming - outliers

Most remaining PRs that are suitable for upstreaming are in the "code size" category (some of them also save a bit of RAM), except two that I'll mention now to get them out of the way. I'll complete my review of code size PRs on Monday.

RAM: X.509 parsing on-demand: This was started by Hanno as a PR in the development branch, then side-ported to baremetal where it was finalized and merged, and the original development PR was never merged. It now needs rebasing for conflict resolution (which Hanno is prepared to do) and review (which needs resourcing as the PR is large en complex, but Manuel already reviewed it on the baremetal side, so could act as a first reviewer).

Misc: remove an unnecessary cast in X.509

3509 - one-line change, why not clean it up in development as well?

yanesca commented 3 years ago

For categories 2 and 3 I feel the same way as you do: I don't think it is worth upstreaming them, but we can learn from them, when we are implementing similar improvements.

For category 1 PRs I was against upstreaming at first, because the sheer number of PRs was intimidating. Taking a closer look showed that most of them are really tiny or consist mainly test data (certificates). There are maybe 2 or 3 medium sized PRs that would take longer to review.

I very much would like to see some code size measurements in our tests: from time to time we do code size optimisations, but without measuring and tracking it all that effort just falls in the category of premature optimisation.

I have some concern about taking that somewhat arbitrary configuration as the standard for our measurements, but as you say it is fairly reasonable configuration and I think it is a good starting point.

In summary, I am for upstreaming the category 1 PRs as well.

mpg commented 3 years ago

PRs suitable for upstreaming: code size

This is the bulk of upstreamable PRs. The numbers at the end are the impact on code size with ARM-GCC (as measured with scripts/baremetal.h, see the "measurements" series above.

Note: I've tried to group PRs than are simple fixes or follow-up of another PR with the original. I've also tried to document dependencies but I might have missed a few. Same for possible API/ABI break (I might also have flagged some that are actually not problematic, so the flag just means "needs attention/discussion").

New config.h options to make more code optional

We already have ways to configure many things out of the build, but it turned out that we could use more of those.

Hardcoding of SSL options

The second PR in this series introduces a new concept, used by all subsequent PRs: whenever we have something in SSL that can be configured at run-time (eg: list of allowed algorithms, use of extensions, protocol versions, etc.), if we know at compile-time what that option will be, we can say so in config.h and then remove the code that allowed to set this option and negotiate it with the peer. This is done via a set of new MBEDTLS_SSL_CONF_xxx options introduced by PRs in this series.

Note: all PRs in this series depend the first two: more precisely, they all depend on r592 which depends on r590.

Misc (SSL)

Misc (X.509)

Note: the two PRs in this series both depend on r584, which is X.509 on-demand parsing (part of RAM optimisation work) (upstream: #2478) - mentioned in the previous comment

Note: since those two PRs are about removing unused fields of technically-public structures in some configurations, API/ABI compatibility questions need to be checked before upstreaming, or we might choose to include this in 3.0 work.

Misc (other)

mpg commented 3 years ago

Code-size saving PRs sorted by bytes saved

The table below lists PRs that have code-size savings mentioned in their description, sorted from larger to smallest savings. For each change, only the "base" PR is listed (not follow-ups for bugfixes and tests, nor preparatory work). Some PRs making multiple changes, the total code size savings of all changes in that PR is used.

Note: excluded from this table is the change with the largest impact (use TinyCrypt to replace our ECC implementation, including bignum), because I don't think it's suitable for upstreaming. Its exact impact is harder to measure (a large part of bignum would be removed by the linker anyway) but was likely of the order of 4 to 8 KiB.

Bytes saved main PR group short description
2384 #2891 config.h crypto: aes: option for encrypt only
1438 r601 config.h ssl: session resumption
1246 r606 ssl-conf ciphersuite
1192 r585 config.h ssl: DTLS-only builds
712 r593 ssl-conf version range
696 r652 MAYBE-x509 single translation unit
652 r642 config.h x509/ssl: hostname verification
528 r594 ssl-conf various options
485 r654 MAYBE-layers MD layer
476 #2835 MAYBE-layers PK layer
440 r605 x509 misc rm struct fields unused depending on config.h
344 r649 misc mixed optimize structures
308 r586 ssl misc remember peer CRT only if renego enabled
294 r657 misc mixed use shared functions for char/uint conversions
268 r602 ssl-conf elliptic curve
268 #2890 config.h crypto: aes: option for 128-bit keys only
220 r609 config.h x509/ssl: verification callbacks
188 r621 config.h ~crypto: SHA-256 without SHA-224~ #6784
184 r596 ssl-conf callbacks (RNG, IO, timer)
178 r604 ssl-conf signature hash
148 r610 ssl misc delay sending alerts
126 r592 ssl-conf groundwork: hardcode use of EMS extension
108 r641 ssl misc rm ssl_calc_xxx function pointers
88 r589 x509 misc rm time code if !TIME_DATE
76 r599 ssl-conf prep: save dynamic alloc of supported curves
68 r635 ssl misc adapt version encoding in DTLS-only builds
24 r612 ssl misc rm compression field if disabled
20 r633 ssl misc rm minlen from transform structure

Total savings from items in the "YES" category: 11502 bytes. Additional savings from items in the "MAYBE" category: 1657 bytes.

gilles-peskine-arm commented 3 years ago

I learned about the impact of structure field ordering on code size from the work on the baremetal branch., yet I couldn't find something corresponding to https://github.com/ARMmbed/mbedtls/issues/4747 in this list. Did I miss it?

hanno-becker commented 3 years ago

@gilles-peskine-arm IIRC it was r649. The logic applied there, however, was not to move commonly used fields to the top, but rather sort them in size from smallest to biggest. This helps because the maximum immediate offset is 127 * the size of the access, so 32-bit accesses have a large max. offset than byte accesses. E.g. if you have a struct with 128 byte fields, 64 halfword fields and 32 word fields, you can access all of them from the base of the structure through an immediate offset if and only if you order them in precisely this way.

It might be worth revisiting whether the logic you apply might be more suitable in places.

mpg commented 1 year ago

Note: since this study was done, one of the changes listed above has been made independently in development:

188 r621 config.h crypto: SHA-256 without SHA-224

This was done in https://github.com/Mbed-TLS/mbedtls/pull/6784 as part of cleaning up before driver-only hashes work.

(I didn't check if other changes have happened, I just noticed this one.)