mpg commented 3 years ago

The baremetal branch of Mbed TLS includes changes, features and options that allow to reduce the size of the code and/or the RAM usage of the library. Most of those would be of general interest to people who use Mbed TLS in highly constrained environments and would be suitable for upstreaming; while some of them are either too specific, or would conflict with or be made redundant by future development (such as the move the PSA Crypto).

This task is to:

go over the history of the baremetal branch to identify all changes that are good candidates for upstreaming
create two epics: "RAM improvements" and "Code size improvements" and start populating them with tasks based on the identified changes (each task should include a reference to the original PR(s) and any other relevant information)

This task does not include doing any of the actual side-porting/upstreaming - this work will need to be planned later from the above epics.

mpg commented 3 years ago

I used this to list all PRs that have been merged in the baremetal branch:

git log --reverse --merges --format='%s' mbedtls-2.16..baremetal |
    sed '/mbedtls-2.16. into/d
        /baremetal. into/d
        s/.* pull request #\([0-9]*\) from .*/\1/g
        s/.* remote-tracking branch [^ ]*\/\([0-9]*\). into .*/\1/
        s#^[0-9][0-9][0-9]$#https://github.com/ARMmbed/mbedtls-restricted/pull/&#
        s#^[0-9]*$#https://github.com/ARMmbed/mbedtls/pull/&#
        '

The output consists of 151 PRs. I did a first pass over that to annotate them with:

desirability for upstream: yes / no / maybe / done
type of change: RAM / code size / tests / measurements / bugfix / other
general area: crypto / x509 / ssl - to be refined later for PRs of interest

I'm attaching the resulting prs-baremetal.txt file for reference, but I don't suggest anyone reviews it, as this would be tedious and provide low value.

I'll report the results by category in follow-up comments, making use of the following script:

for want in NO DONE MAYBE YES; do
    printf "\n%s\n" $want
    sed -n "s/.*$want - //p" prs-baremetal.txt |
        sort | uniq -c | sort -rn |
        awk '{t += $1; print} END { print "total:", t }'
done

mpg commented 3 years ago

Rejected PRs

There are 67 PRs in the baremetal branch I'm suggesting to reject as out-of-scope or inapplicable, which fall in the following categories:

Fault Injection (25 PRs): out of scope, currently this is well outside our threat model. (Also, the protection in the baremetal branch is very partial, as it only covers the small subset of the code base that's of interest in that branch.)
Use tinycrypt (13 PRs): not applicable, if we did something similar upstream we'd probably do it very differently (based on PSA crypto API for example), and it's not clear we'd want to use tinycrypt anyway.
Side-channel attacks (13 PRs): out of scope: currently this is outside our threat model, especially for symmetric crypto. We might revisit this later, but even some counter-measures were IMO a bit hackish (dummy rounds in AES) so might not be suitable for the main branch.
Regular merges from 2.16 (12 PRs): not applicable
Follow-ups (3 PRs): not applicable, fixes for issues introduced by other rejected PRs
project-specific (1 PR): not applicable, adapting to project-specific infrastructure

mpg commented 3 years ago

PRs that were already upstreamed

I've identified 28 PRs that don't need upstreaming, either because it was already done, or because the PR is actually a side-port from something that was first done upstream. For information, they fall in the following categories:

SSL context serialisation (9 PRs), (D)TLS Connection ID extension (5 PRs) and the (D)TLS record checking (3PRs) features were upstreamed immediately as they're required server-side, and the the server uses the main branch, no the baremetal one.
Various bug fixes (5 PRs) were side-ported from development to the baremetal branch.
X.509 RAM improvements started before the baremetal branch was created, so the first 4 PRs (including 1 test PR) here are already upstream (only the last one isn't, see below)
2 PRs with code size improvements first came to the development branch as community contributions and were then side-ported to the baremetal branch.

mpg commented 3 years ago

PRs that we may or may not want to upstream

There are 20 PRs here, that I'll group in 3 series.

1. Measurements: There are 16 PRs that relate directly or indirectly to measuring the size of Mbed TLS in the configuration of interest to the baremetal branch.

A new reference configuration configs/baremetal.h is introduced (and progressively augmented to include all the config options that reduce code size as they are introduced), with a variant that adds convenience things for runtime tests (configs/baremetal_test.h).
A new script scripts/baremetal.sh gives figures for code size (of the library, ie before linker garbage collection) with the based config and RAM usage (of ssl_client2, based on valgrind's massif tool) with the variant for runtime tests.
A new component is added to all.sh to run tests with this config (actually the slightly modified one), and changes are made to ssl-opt.sh so that it can be run entirely with that config and have decent coverage (which includes a new familiy of test certificates using only the P-256 curve).

Reason(s) against upstreaming this: the config was designed with a specific use case in mind.

Reasons for upsteaming this include:

while written with a specific use case in mind, the config is quite representative of most minimal modern configs
if you want to improve something, you need to start by measuring it - this script is a significant improvement over our existing measurements
the test improvements (especially in ssl-opt.sh) are good to have, and strongly interesct with things we've been meaning to do for a long time

Foreseeable difficulties in upstreaming (or indeed not upstreaming) this: it's probably going to be hard to upsteam all PRs in this series at once, as later PRs in this series will depend on other code size PRs (introducing new config.h) options. Conversely, not upstreaming this series is likely to create issue when upstreaming some code size PRs that update configs/baremetal.h or rely on improvements in this series for testing.

Recommendation: I'm quite in favour of upstreaming this, but I'd like to know what others think.

2. Code size: reduce cost of the MD and PK layers when a single alg is available

MD layer: https://github.com/ARMmbed/mbedtls-restricted/pull/654 - 485 bytes saved, 42 files changed, 1701 insertions(+), 899 deletions(-) (the github numbers are wrong)
PK layer: https://github.com/ARMmbed/mbedtls/pull/2835 - 476 bytes saved, 24 files changed, 1628 insertions(+), 1129 deletions(-)

Overview: make the MD layer zero-cost (in terms of code-size and runtime indirection) when a config.h option signals that a single hash is used. Concretely, if only SHA-256 is used, a call to mbedtls_md_xxx() from say X509 or TLS will be replaced by the equivalent call to mbedtls_sha256_xxx() at compile-time. Start in the same direction in the PK layer, but not going as far (call to mbedtls_pk_xxx() replaced by the xxx_wrap() function). See PR descriptions for more details.

Reasons for upstreaming: who doesn't want to save hundreds of bytes? (probably more than 1K if we complete the work in the PK layer (see PR description) and do something similar in the Cipher layer as well)

Reasons against upstreaming:

The MD, PK, and Cipher layers are on there way out, with PSA Crypto API being the future.
Those are large changes, not easy to review
Indirection is increased, and pre-processor cleverness applied, which might reduce readability of the resulting code
Lots of things end up living in public header files (how else could the compiler resolve and inline everything?) which is kind of contrary to our general goal of reducing our API surface.

Recommendation: I'm inclined not to upstream this as-is, but perhaps study it to see if some of the ideas could be applied in the implementation of the PSA Crypto API without compromising other goals. (Note: the PK PR is probably easier to review, but the MD PR goes further, as explained in the PK PR's description.)

3. Code size: Make all of X.509 as single compilation unit

main PR: https://github.com/ARMmbed/mbedtls-restricted/pull/652 - 696 bytes saved, 14 files changed, 292 insertions(+), 236 deletions(-)
follow-up: https://github.com/ARMmbed/mbedtls/pull/2904/ - just fixes tests on Mbed OS

Overview: turn all of the X509 library into a single translation unit, by #includeing other .c files from it; make a lot more functions static.

Reasons for upstreaming: who doesn't want to save hundreds or bytes, especially by just letting the compiler to more work for us?

Reasons against upstreaming: the way it's done is not very clean and make cause issues for people importing Mbed TLS in another project (as demonstrated by the Mbed OS issue).

Recommendation: I don't think we should upstream as-is, but we should definitely looks for ways to achieve the same result in a cleaner way. The savings are quite significant (and making more functions static reduce our ABI surface as a bonus).

mpg commented 3 years ago

PRs suitable for upstreaming - outliers

Most remaining PRs that are suitable for upstreaming are in the "code size" category (some of them also save a bit of RAM), except two that I'll mention now to get them out of the way. I'll complete my review of code size PRs on Monday.

RAM: X.509 parsing on-demand: This was started by Hanno as a PR in the development branch, then side-ported to baremetal where it was finalized and merged, and the original development PR was never merged. It now needs rebasing for conflict resolution (which Hanno is prepared to do) and review (which needs resourcing as the PR is large en complex, but Manuel already reviewed it on the baremetal side, so could act as a first reviewer).

baremetal PR: https://github.com/ARMmbed/mbedtls-restricted/pull/584
development PR: https://github.com/ARMmbed/mbedtls/pull/2478

Misc: remove an unnecessary cast in X.509

3509 - one-line change, why not clean it up in development as well?

yanesca commented 3 years ago

For categories 2 and 3 I feel the same way as you do: I don't think it is worth upstreaming them, but we can learn from them, when we are implementing similar improvements.

For category 1 PRs I was against upstreaming at first, because the sheer number of PRs was intimidating. Taking a closer look showed that most of them are really tiny or consist mainly test data (certificates). There are maybe 2 or 3 medium sized PRs that would take longer to review.

I very much would like to see some code size measurements in our tests: from time to time we do code size optimisations, but without measuring and tracking it all that effort just falls in the category of premature optimisation.

I have some concern about taking that somewhat arbitrary configuration as the standard for our measurements, but as you say it is fairly reasonable configuration and I think it is a good starting point.

In summary, I am for upstreaming the category 1 PRs as well.

mpg commented 3 years ago

PRs suitable for upstreaming: code size

This is the bulk of upstreamable PRs. The numbers at the end are the impact on code size with ARM-GCC (as measured with scripts/baremetal.h, see the "measurements" series above.

Note: I've tried to group PRs than are simple fixes or follow-up of another PR with the original. I've also tried to document dependencies but I might have missed a few. Same for possible API/ABI break (I might also have flagged some that are actually not problematic, so the flag just means "needs attention/discussion").

New config.h options to make more code optional

We already have ways to configure many things out of the build, but it turned out that we could use more of those.

r585 + r598 (API fix) + r656 (test) + #2865 (bugfix) ssl: DTLS-only builds -1192
r601 ssl: session resumption -1292 -156
r642 x509/ssl: hostname verification -504 -148
r621 crypto: SHA-256 without SHA-224 -188
r609 x509/ssl: verification callbacks -172 -48
2890 + #3467 (follow-up) crypto: aes (+cipher): option for 128-bit keys only -268
2891 crypto: aes: option for encrypt only -2384

Hardcoding of SSL options

The second PR in this series introduces a new concept, used by all subsequent PRs: whenever we have something in SSL that can be configured at run-time (eg: list of allowed algorithms, use of extensions, protocol versions, etc.), if we know at compile-time what that option will be, we can say so in config.h and then remove the code that allowed to set this option and negotiate it with the peer. This is done via a set of new MBEDTLS_SSL_CONF_xxx options introduced by PRs in this series.

Note: all PRs in this series depend the first two: more precisely, they all depend on r592 which depends on r590.

r590 [preparatory] option to enforce EMS (session hash) extension +136
r592 groundwork: hardcode use of EMS (session hash) extension -126
r594 various options -108 -72 -76 -16 -136/120 -32 -104
r596 callbacks (RNG, IO, timer): -64 -92 -28
r599 [preparatory] save dynamic alloc of supported curves -76 (prep for: r606 r602)
r606 ciphersuite (dep: r599) -1246
r593 + r627 (fixup): version range -712 (fixed by r627)
r602 + r634 (fixup): elliptic curve (dep: r599) (fixed by: r602) -268
r604 signature hash -178
r628 add tests and fixes (dep: r627 r626) (note: r626 is part of measurements)

Misc (SSL)

r586 remember peer CRT only if renego enabled -308
r610 delay sending alerts -148
r612 rm compression field if disabled (API break? 3.0?) -24
r603 rm debug pointers from structure if debug compiled out (API break? 3.0?)
r633 rm minlen from transform structure (internal) -20
r641 rm ssl_calc_xxx function pointers (internal) -108
r635 adapt version encoding in DTLS-only builds (dep: r656) -68

Misc (X.509)

Note: the two PRs in this series both depend on r584, which is X.509 on-demand parsing (part of RAM optimisation work) (upstream: #2478) - mentioned in the previous comment

Note: since those two PRs are about removing unused fields of technically-public structures in some configurations, API/ABI compatibility questions need to be checked before upstreaming, or we might choose to include this in 3.0 work.

r589 x509 misc: rm time code if !TIME_DATE -88
r605 x509 misc: rm struct fields unused depending on config.h (dep: r589) -440

Misc (other)

r649 misc crypto/x509/ssl: optimize structures !ABI break! -344
r657 misc crypto/ssl: use shared functions for char/uint conversions -176 -118

mpg commented 3 years ago

Code-size saving PRs sorted by bytes saved

The table below lists PRs that have code-size savings mentioned in their description, sorted from larger to smallest savings. For each change, only the "base" PR is listed (not follow-ups for bugfixes and tests, nor preparatory work). Some PRs making multiple changes, the total code size savings of all changes in that PR is used.

Note: excluded from this table is the change with the largest impact (use TinyCrypt to replace our ECC implementation, including bignum), because I don't think it's suitable for upstreaming. Its exact impact is harder to measure (a large part of bignum would be removed by the linker anyway) but was likely of the order of 4 to 8 KiB.

Bytes saved	main PR	group	short description
2384	#2891	config.h	crypto: aes: option for encrypt only
1438	r601	config.h	ssl: session resumption
1246	r606	ssl-conf	ciphersuite
1192	r585	config.h	ssl: DTLS-only builds
712	r593	ssl-conf	version range
696	r652	MAYBE-x509	single translation unit
652	r642	config.h	x509/ssl: hostname verification
528	r594	ssl-conf	various options
485	r654	MAYBE-layers	MD layer
476	#2835	MAYBE-layers	PK layer
440	r605	x509 misc	rm struct fields unused depending on config.h
344	r649	misc mixed	optimize structures
308	r586	ssl misc	remember peer CRT only if renego enabled
294	r657	misc mixed	use shared functions for char/uint conversions
268	r602	ssl-conf	elliptic curve
268	#2890	config.h	crypto: aes: option for 128-bit keys only
220	r609	config.h	x509/ssl: verification callbacks
188	r621	config.h	~crypto: SHA-256 without SHA-224~ #6784
184	r596	ssl-conf	callbacks (RNG, IO, timer)
178	r604	ssl-conf	signature hash
148	r610	ssl misc	delay sending alerts
126	r592	ssl-conf	groundwork: hardcode use of EMS extension
108	r641	ssl misc	rm `ssl_calc_xxx` function pointers
88	r589	x509 misc	rm time code if !TIME_DATE
76	r599	ssl-conf	prep: save dynamic alloc of supported curves
68	r635	ssl misc	adapt version encoding in DTLS-only builds
24	r612	ssl misc	rm compression field if disabled
20	r633	ssl misc	rm minlen from transform structure

Total savings from items in the "YES" category: 11502 bytes. Additional savings from items in the "MAYBE" category: 1657 bytes.

gilles-peskine-arm commented 3 years ago

I learned about the impact of structure field ordering on code size from the work on the baremetal branch., yet I couldn't find something corresponding to https://github.com/ARMmbed/mbedtls/issues/4747 in this list. Did I miss it?

hanno-becker commented 3 years ago

@gilles-peskine-arm IIRC it was r649. The logic applied there, however, was not to move commonly used fields to the top, but rather sort them in size from smallest to biggest. This helps because the maximum immediate offset is 127 * the size of the access, so 32-bit accesses have a large max. offset than byte accesses. E.g. if you have a struct with 128 byte fields, 64 halfword fields and 32 word fields, you can access all of them from the base of the structure through an immediate offset if and only if you order them in precisely this way.

It might be worth revisiting whether the logic you apply might be more suitable in places.

mpg commented 1 year ago

Note: since this study was done, one of the changes listed above has been made independently in development:

188 r621 config.h crypto: SHA-256 without SHA-224

This was done in https://github.com/Mbed-TLS/mbedtls/pull/6784 as part of cleaning up before driver-only hashes work.

(I didn't check if other changes have happened, I just noticed this one.)

Mbed-TLS / mbedtls

Identify improvements in baremetal branch that should be upstreamed #3535

PRs that we may or may not want to upstream

PRs suitable for upstreaming - outliers

3509 - one-line change, why not clean it up in development as well?

PRs suitable for upstreaming: code size

2890 + #3467 (follow-up) crypto: aes (+cipher): option for 128-bit keys only -268

2891 crypto: aes: option for encrypt only -2384