EC J-PAKE error-injection tests fail randomly

Mbed-TLS / mbedtls

An open source, portable, easy to use, readable and flexible TLS library, and reference implementation of the PSA Cryptography API. Releases are on a varying cadence, typically around 3 - 6 months between releases.

https://www.trustedfirmware.org/projects/mbed-tls/

Other

5.56k stars 2.61k forks source link

EC J-PAKE error-injection tests fail randomly #6503

Closed mpg closed 2 years ago

mpg commented 2 years ago

The tests that inject errors fail (error not detected) with non-zero probability: after running them 10000 times on my machine I observed 2 such failures - this experiment was done after we observed such a test failing in a nightly CI run for no apparent reason.

Steps to reproduce:

comment out everything in test_suite_psa_crypto.data except the ecjpake_rounds_inject test.
build
run the test suite 10k times in a row or more

My working hypothesis is that error injection is not robust enough and ends up generating valid data with a probably that's not low enough. (Something like 2^(-128) would be perfectly acceptable and should be easy to reach in a crypto setting, but apparently we're not nearly there.)

Note: analyzing this may or may not require knowledge of the EC J-PAKE protocol, including checking a few equations, to understand what's need to make the proofs fail with high enough probability.

mpg commented 2 years ago

Note: if I understand the code correctly:

            if( inject_error == 1 )
            {
                buffer0[s_x1_pk_off + 8] >>= 4;
                buffer0[s_x2_pk_off + 7] <<= 4;
                expected_status = PSA_ERROR_DATA_INVALID;
            }

we corrupt the "public key" part of the message. I don't think there's a particular reason for doing that. I suspect it would be more effective to corrupt the "proof" part of the message (that is, s/pk_off/pr_off/ above and in other places where we inject errors).

Intuitively, since a proof is conceptually similar to a signature, changing anything in it should make it invalid with very high probability. So I suggest trying that first, and if that doesn't work, then we might need to dig deeper.

AndrzejKurek commented 2 years ago

Note: if I understand the code correctly:
            if( inject_error == 1 )
            {
                buffer0[s_x1_pk_off + 8] >>= 4;
                buffer0[s_x2_pk_off + 7] <<= 4;
                expected_status = PSA_ERROR_DATA_INVALID;
            }
we corrupt the "public key" part of the message. I don't think there's a particular reason for doing that. I suspect it would be more effective to corrupt the "proof" part of the message (that is, s/pk_off/pr_off/ above and in other places where we inject errors).

Intuitively, since a proof is conceptually similar to a signature, changing anything in it should make it invalid with very high probability. So I suggest trying that first, and if that doesn't work, then we might need to dig deeper.

I just wanted to confirm what's happening with the data and print post/pre corruption buffers in case of failing test cases (as well as make sure that I can reproduce it locally with enough runs / estimate how often the error happens), but after that I'll test your suggestion.

gilles-peskine-arm commented 2 years ago

buffer0[s_x1_pk_off + 8] >>= 4;
buffer0[s_x2_pk_off + 7] <<= 4;

Assuming the two bytes are uniformly distributed, this has a 1/65536 chance of not causing any corruption. To be sure to corrupt something, use an operation that guarantees that it changes the value, for example x ^= 1.

AndrzejKurek commented 2 years ago

Tested locally with printing buffers, and the failures happened in cases described before - when two chars of zero value were being shifted. Solution proposed here: https://github.com/Mbed-TLS/mbedtls/pull/6561

mpg commented 2 years ago

Ah, thanks to you two for analysing this, it's quite obvious in retrospect and in most other places we do already use ^= 1 to corrupt data. It's quite satisfying to understand what was going wrong - much more than my suggestion of using pr_off instead an hoping it worked, which in retrospect was misguided (doesn't hurt, but quite orthogonal to the actual point).