Closed gilles-peskine-arm closed 1 year ago
I have a problem with reproducing this bug. I'm not familiar with OSS-Fuzz. fuzz_x509crt
app provides only a fuzz target.
I started with development
branch and default config:
przemek@przemek-VirtualBox:~/mbedtls$ ASAN_CFLAGS='-O2 -Werror -fsanitize=address,undefined -fno-sanitize-recover=all'
przemek@przemek-VirtualBox:~/mbedtls$ echo $ASAN_CFLAGS
-O2 -Werror -fsanitize=address,undefined -fno-sanitize-recover=all
przemek@przemek-VirtualBox:~/mbedtls$ make CFLAGS="$ASAN_CFLAGS" LDFLAGS="$ASAN_CFLAGS"
przemek@przemek-VirtualBox:~/mbedtls$ programs/fuzz/fuzz_x509crt clusterfuzz-testcase-minimized-fuzz_x509crt-6666050834661376
przemek@przemek-VirtualBox:~/mbedtls$
I read readme file and I was able to run oss-fuzz tests for mbedtls, but to reproduce this issue I suspect that this part is relevant:
To run the fuzz targets without oss-fuzz, you first need to install one libFuzzingEngine (libFuzzer for instance). Then you need to compile the code with the compiler flags of the wished sanitizer.
So I installed libFuzzer:
przemek@przemek-VirtualBox:~/oss-fuzz$ pip install libFuzzer
Collecting libFuzzer
Downloading libfuzzer-0.0.2-py3-none-manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (229 kB)
|████████████████████████████████| 229 kB 2.4 MB/s
Installing collected packages: libFuzzer
Successfully installed libFuzzer-0.0.2
And tried to rerun przemek@przemek-VirtualBox:~/mbedtls$ programs/fuzz/fuzz_x509crt clusterfuzz-testcase-minimized-fuzz_x509crt-666605083466137
but still nothing happened.
BTW it seems that there is a wrong location in readme file:
Finally, you can run the targets like ./test/fuzz/fuzz_client
it seems this should refer to programs
not test
?
You don't need oss-fuzz to use the fuzzing that are in the mbedtls source tree. Just build the library. Since the bug is a memory leak, you need to build with Asan enabled: otherwise nothing tries to detect the memory leak. The options you passed to make
should be enough to reproduce the error. Did you try running make clean
and then make CFLAGS=... LDFLAGS=...
again?
Yes, I rebuild several times with make clean
before each build (with default and full configs) and development
and https://github.com/Mbed-TLS/mbedtls/commit/97edeb4fb8a9890a8511363f55d5f27f29c4577a branches and ensured that flags are set CFLAGS
by echoing $ASAN_CFLAGS
.
programs/fuzz/fuzz_x509crt
does not provide main
function (there is only fuzz target) and I'm not sure how can this work?
Edit:
The main
is defined in onefile.c
.
This is unrelated to reproducing the bug but I think I have an idea what could be causing this.
Looking at the stack trace, it looks like:
x509_get_authority_key_id()
is getting an allocated sequence for the subject alt name here which it puts into crt.authority_key_id.authorityCertIssuer
.x509_crt_der_parse_core()
which sees it and calls mbedtls_x509_crt_free()
.mbedtls_x509_crt_free()
does not free the crt.authority_key_id.authorityCertIssuer
member.It is caused in part because mbedtls_x509_get_subject_alternative_name_ext()
doesn't free the allocated sequence on error. Designing the function this way has lead to at least one previous memory leak and I have coincidentally put a fix up as #7581.
However, this case is really supposed to be dealt with by the call to mbedtls_x509_crt_free()
whenever an error happens, so freeing the authority_key_id
member properly there would be the true fix.
If the above is correct, it should memory-leak when parsing any authorityCertIssuer
with more than one GeneralName
unless there's something I'm missing..
Thanks @davidhorstmann-arm. Can you paste commands that that you have used to build library and reproduce the memory leak failure?
Actually I haven't reproduced it locally, I was just looking at the code and the stacktrace in the issue description. I can have a go at reproducing it this afternoon if that would help.
It is caused in part because
mbedtls_x509_get_subject_alternative_name_ext()
doesn't free the allocated sequence on error. Designing the function this way has lead to at least one previous memory leak and I have coincidentally put a fix up as #7581.However, this case is really supposed to be dealt with by the call to
mbedtls_x509_crt_free()
whenever an error happens, so freeing theauthority_key_id
member properly there would be the true fix.
mbedtls_x509_crt_free()
, even on success.Depending on how mbedtls_x509_get_subject_alt_name_ext()
is called, cleaning the sequence up in that function on error may be strictly unnecessary, if all paths involve a crt object and a call to mbedtls_x509_crt_free()
on error paths.
[I identified the issue just from code review and the asan report - as per @davidhorstmann-arm]
Yes I finally reproduced it locally. I was missing input file.
@davidhorstmann-arm is right mbedtls_asn1_sequence_free(authority_key_id->authorityCertIssuer.next);
is missing on mbedtls_x509_get_subject_alt_name_ext()
failure.
7581 is incomplete, because the sequence has to be freed as part of
mbedtls_x509_crt_free()
, even on success.
Ah, you're right, thanks!
Depending on how
mbedtls_x509_get_subject_alt_name_ext()
is called, cleaning the sequence up in that function on error may be strictly unnecessary, if all paths involve a crt object and a call tombedtls_x509_crt_free()
on error paths.
That's correct, and in the past we did not have these functions clean up their objects upon error until we changed one in #6391 after it was misused.
@mprse I wasn't sure if you were working on a fix already, but I've drafted one (untested) in #7593. It seems to be a one-line fix, feel free to supersede it if you have one already, but I continue with it if that would help.
I started working on it. I reproduced the memory leak in test and confirmed that adding mbedtls_asn1_sequence_free(authority_key_id->authorityCertIssuer.next);
on mbedtls_x509_get_subject_alt_name_ext()
failure fixes the problem. I still need to analyse the code as this might not be a complete fix as @athoelke suggested, but I wonder why we don't have memory leak on successful parsing if mbedtls_x509_crt_free
does not free authorityCertIssuer
sequence.
Also need to generate certificate with more than one authorityCertIssuer
(not sure if this is even possible) and corrupt the second entry to cover this case.
Looking at rfc5280 authorityCertIssuer
is of GeneralNames
which is the sequence of GeneralName
type. In that case I agree that we need https://github.com/Mbed-TLS/mbedtls/pull/7581 + freeing authorityCertIssuer
sequence in mbedtls_x509_crt_free()
.
We currently don't have memory leak in positive tests, because we have only test cases with single authorityCertIssuer
name.
I'm not sure if there can be more than one name or how to generate such certificate using open_ssl.
Now the following extension rule is used:
authorityKeyId_subjectKeyId.crt.der:
$(OPENSSL) req -x509 -nodes -days 7300 -key server2.key -outform DER -out $@ -config authorityKeyId_subjectKeyId.conf -extensions 'v3_req'
[v3_req]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer:always
Can someone more familiar with certificates give a little help here?
I analysed the certificate used in fuzz test that caused memory leak. It is built as follows (AuthorityKeyIdentifier
part):
Bytes | Tags | Len | Comment |
---|---|---|---|
304A | MBEDTLS_ASN1_CONSTRUCTED | MBEDTLS_ASN1_SEQUENCE | 4A(74) | AuthorityKeyIdentifier sequence |
8014 | MBEDTLS_ASN1_CONTEXT_SPECIFIC | 14(20) | keyIdentifier |
2020202020202020202020202020202020202020 | - | - | keyIdentifier data |
A12A | MBEDTLS_ASN1_CONTEXT_SPECIFIC | MBEDTLS_ASN1_CONSTRUCTED | 1 | 2A(42) | authorityCertIssuer(General Names) |
A425 | MBEDTLS_ASN1_CONSTRUCTED | MBEDTLS_ASN1_CONTEXT_SPECIFIC | MBEDTLS_X509_SAN_DIRECTORY_NAME | 25(37) | First name of directory name type |
30233110300E0603202020130720202020202020310F300D06032020201306202020202020 | - | - | First name data |
8200 | MBEDTLS_ASN1_CONTEXT_SPECIFIC | MBEDTLS_X509_SAN_DNS_NAME | 0 | Second name of DNS name type (alloc memory for structure for the new name) |
2020 | _MBEDTLS_ASN1CONSTRUCTED | 20(32) | Third name of unknown type and length exceeding space for authorityCertIssuer |
20202020202020... | - | - | Remaining data |
Can I use this corrupted certificate used in fuzz tests as our test input?
We can use the data from the fuzzer as a non-regression test. But it's better to also have a test case that is specifically constructed to trigger the specific error path. Data from a fuzzer often matches several error conditions, so it can be fragile: maybe later it won't trigger that particular error path anymore. Also data from a fuzzer can be harder to debug if the test case fails after some refactoring attempt.
authorityCertIssuer
field in the certificate generated using open ssl holds the copy of the certificate Issuer
(of Name
type) represented as sequence of directoryNames
(of Name
type):
In the malformed certificate from fuzz test authorityCertIssuer
consists of 3 names:
directoryName
dNSName
Such malformation is very hard to achieve. It is not just single byte malformation but rewriting cert with some additional names in authorityCertIssuer
with compliance in the sizes of individual elements in the tree.
I'm wondering if this patch is even needed for this particular case for authorityCertIssuer
if only single entry of directoryName
sequence is possible, but as mbedtls_x509_get_subject_alt_name_ext
is universal function for parsing generalNames
it is needed. I will use the fuzz cert for a non-regression test.
It looks like freeing authorityCertIssuer
sequence in mbedtls_x509_crt_free
is sufficient. It is always called on failuer while parsing AuthorityKeyIdentifier
. mbedtls_x509_get_subject_alt_name_ext
is also used to parse subject altnames for both crt and csr and in both cases on failure free routine is called and releases sequence of subject altnames.
The mbedtls fuzzer running on OSS-Fuzz has found a memory leak in X.509 code (private link). It was introduced between aaa26f25be091f482eff8698c9619293acfc27e3 and 97edeb4fb8a9890a8511363f55d5f27f29c4577a, and the only pull request that touches X.509 in that range is #6866.
To reproduce: build with Asan. The default configuration works. Run:
clusterfuzz-testcase-minimized-fuzz_x509crt-6666050834661376.gz
I can reproduce the failure on my machine. I haven't tried to analyze it.
Goals of this issue:
return
should have beengoto cleanup
in one function, check otherreturn
statements introduced around the same time) and fix and test them.