Incorrect result of the mbedtls_mpi_mul_mod()

chemandante commented 1 year ago

Summary

mbedtls_mpi_mul_mod() gives incorrect result for given values.

Using curve MBEDTLS_ECP_DP_SECP256R1, I need to calculate Y = X * X mod P using mbedtls_mpi_mul_mod() (to be honest, this function is called from mbedtls_ecp_mul() during call to mbedtls_ecp_check_pubkey())

My values are: X = 0x4FE342E2FE1A7F9B8EE7EB4A7C0F9E162BCE33576B315ECECBB6406837BF51F5 P = 0xFFFFFFFF00000001000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFF

Y = 0x55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91 (calculated using mbedTLS 3.4.0)
Y = 0x55DF5D5850F47BAD82149139979369FE498A9022A412B5E0BEDD2CFC21C3ED91 (correct result using mbedTLS 3.3.0)
                     ^       ^               ^

System information

Mbed TLS version (number or commit id): 3.4.0 (tagged release commit) Operating system and version: Compiler for DSP TI C6000 (TMS320C66x series processors) (https://www.ti.com/tool/C6000-CGT) Configuration (if not default, please attach mbedtls_config.h): mbedtls_config.zip Compiler and options (if you used a pre-built binary, please indicate how you obtained it): TI C6000 compiler Additional environment information:

Expected behavior

Y = 0x55DF5D5850F47BAD82149139979369FE498A9022A412B5E0BEDD2CFC21C3ED91

Actual behavior

Y = 0x55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
                     ^       ^               ^

differs only in three digits

Steps to reproduce

Just call mbedtls_mpi_mul_mod with these values

Additional information

This problem disappears when I comment MBEDTLS_ECP_NIST_OPTIM macro in mbedTLS 3.4.0, but in mbedTLS 3.3.0 it works fine even when MBEDTLS_ECP_NIST_OPTIM remains uncommented.

I think, the root of the problem is in mbedtls_ecp_mod_p256_raw(), but I can't find exact solution.

P.S. X * X calculates correctly. The problem appears after reduction modulo P.

minosgalanakis commented 1 year ago

Hi, and thank you for your contribution.

We are currently into the proccess of migrating the bignum interface. While I am looking into this issue could you please try to test it on the development branch, which has PR7230 merged?

chemandante commented 1 year ago

Hi! Exactly the same (i.e. incorrect) result after using commit https://github.com/Mbed-TLS/mbedtls/commit/fffd6d9ded2f2a57cedd1b97268af706c4a5f730

minosgalanakis commented 1 year ago

So following up on this.

I have pushed a simple test program which is meant to test the output using this pair and I have so far failed to reproduce it.

The idea is the following:

As you have confirmed and as I also noticed the multiplication of X * X is correct. So when the issue occurs that happens in the modural reduction stage.

I am assuming that in your configuration the grp->modp == NULL so the mbedtls_mpi_mod_mpi is called. But for sanity reasons I have tested the reduction method, when grp->modp != NULL

Could you please confirm the following:

Is your test code following the same path as described above? Is your grp->modp pointing at null?
Is the issue happening at your system with the branch I just pushed? That can be seen if Test mbedtls_mpi_mul_mpi #99 Issue7658 Data fails
What is the precice call tree that connects mbedtls_mpi_mul_mod with mbedtls_ecp_mod_p256_raw ?

Tip. You can only run this test instead of the whole suite using:

mkdir test_dir && cd test_dir
cmake ../
make -j8 && tests/test_suite_bignum.misc "--verbose" tests/test_suite_bignum.misc.datax

chemandante commented 1 year ago

First of all, Everything I wrote below I tried on your commit (https://github.com/Mbed-TLS/mbedtls/commit/9cb43045de8ae4f3239491548db0998cab1c0565)

When I comment MBEDTLS_ECP_NIST_OPTIM there is no any optimization, grp->modp == NULL, so the mbedtls_mpi_mod_mpi is called and the result is correct.
When I uncomment MBEDTLS_ECP_NIST_OPTIM grp->modp points to ecp_mod_p256 and result is wrong.
Exact call tree with optimization is: mbedtls_mpi_mul_mod -> ecp_modp (through define MOD_MUL) -> ecp_mod_p256 (through grp->modp) -> mbedtls_ecp_mod_p256_raw.
Before call to ecp_mod_p256 X * X is correct, but after this call X * X mod P became wrong. I check this using own mbedtls_printf to write values of mpi before and after call to ecp_mod_p256.
Unfortunately, I can't run your test directly om my embedded system.

Perhaps the different behavior is due to the fact that I use a different processor (but this is a 32-bit DSP and I use verified native compiler)

gilles-peskine-arm commented 1 year ago

@chemandante Would you mind sharing why you can't run the test suite? Is it because you can't pass the .datax file, or are there other problems? We'd like to make it possible to run the test suites on all platforms, but we don't know exactly what the problems are on various platforms.

chemandante commented 1 year ago

@gilles-peskine-arm I use embedded system on own-developed mainboard with TI C6000-series processor on it with TI's own compiler and realtime OS (SYS/BIOS, https://www.ti.com/tool/SYSBIOS). There is no way to run make/cmake, even more Python or Perl script. Maybe I'm certainly wrong.

But I can run mbedtls_ecp_self_test

chemandante commented 1 year ago

So, I run mbedtls_ecp_self_test with 'first available curve' that is given by mbedtls_ecp_curve_list (which in my case is SECP521R1) the test runs smoothly. Argument verbose == 1 during call to test. The result in console is:

 ECP SW test #1 (constant op_count, base point G): 
passed
 ECP SW test #2 (constant op_count, other point): 
passed

Then I force the curve in test to be SECP256R1. Default behavior of test is

MBEDTLS_MPI_CHK(mbedtls_ecp_group_load(&grp, mbedtls_ecp_curve_list()->grp_id));

I change it to:

MBEDTLS_MPI_CHK(mbedtls_ecp_group_load(&grp, MBEDTLS_ECP_DP_SECP256R1));

and I get:

ECP SW test #1 (constant op_count, base point G): 
Unexpected error, return code = FFFFB380

This error is "invalid public or private key", which is actually caused by the subject of this issue. Error given by call tree mbedtls_ecp_mul -> mbedtls_ecp_mul_restartable -> ecp_mul_restartable_internal -> mbedtls_ecp_check_pubkey -> ecp_check_pubkey_sw. In ecp_check_pubkey_sw variables YY and RHS have different values and that gives MBEDTLS_ERR_ECP_INVALID_KEY error. I found exact location of the problem using __LINE__ macro.

minosgalanakis commented 1 year ago

Thanks for coming back. It helps to try things on your setup since we cannot always reproduce every enviroment.

I have tested the mbedtls_ecp_mod_p256_raw using our python framework and the suspect N, P pair and it did not seem to be an issue. I will however push a new commit on the branch that shows how to do that just in case.

While I am looking at it more closely, could you please try the following on your hardware:

Comment out the following section after the mbedtls_ecp_mod_p256_raw reduction.
Report if the result is the same erroneous one, the correct or something completely different.

With the new interface we are moving to fixed width, and performing extra rounds of subtractions in each curves function if needed ( depending on carrys ). I just want to rule out that the highlighted code does not mess with a correct result.

chemandante commented 1 year ago

I've made this test in another way - I've put some printfs in different places of ecp_modp here and got the console output. There are four N's: N1 - before call to mbedtls_ecp_mod_p256_raw, N2 - right after the call, N3 after first while reduction and N4 at final. (grp->modp points to right function in memory)

grp->modp = c228b40
N1: 18EE0D07C8DCD20B9CE0AED20040731334D2A0659B3CDAC48B9F77ED85221580F323BDCF3862CA176DBAFDA0BB50A6DA6C3354D552AEC89C16ECDB42ABD2F479 (len = 129)
N2: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
N3: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
N4: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91

As it can be seen, wrong result appears right after call to mbedtls_ecp_mod_p256_raw

gilles-peskine-arm commented 1 year ago

Can you please share your exact C code? We don't have your processor, but when we try to reproduce the problem, having your code would be less difficulty.

chemandante commented 1 year ago

@gilles-peskine-arm, off course I can. I made a commit with exact version I use to get last tests. Just run mbedtls_ecp_self_test - it gives an error after the call to mbedtls_ecp_mul at line 3586.

Meanwhile I will try to do step by step analysis of mbedtls_ecp_mod_p256_raw because it's obvious that this function behaves differently in our systems.

gilles-peskine-arm commented 1 year ago

Your configuration keeps MBEDTLS_HAVE_ASM enabled, but our code base doesn't have any assembly for your platform, right?

chemandante commented 1 year ago

Yes, I have completely different ASM language. It's strange, but there are no any errors. May be because MBEDTLS_HAVE_ASM is required by MBEDTLS_AESCE_C , MBEDTLS_AESNI_C and MBEDTLS_PADLOCK_C but all of them are commented. I'll try to comment MBEDTLS_HAVE_ASM too.

tom-cosgrove-arm commented 1 year ago

Your configuration keeps MBEDTLS_HAVE_ASM enabled, but our code base doesn't have any assembly for your platform, right?

This shouldn't be a problem as I understand it, since MBEDTLS_HAVE_ASM means the compiler supports asm(), but if we don't have assembly code for the target architecture we should fall back to a C implementation

gilles-peskine-arm commented 1 year ago

MBEDTLS_HAVE_ASM generically enables support for asm, but then there are specific #ifdef for specific CPU architectures. I don't think we have any assembly code for your CPU, but it's better to check to be sure. The relevant file is bn_mul.h, where we have assembly for intel and arm but also for some embedded platforms like microblaze and tricore.

chemandante commented 1 year ago

I found the root of the problem - I think, it was a some kind of compiler fault. When I disable all optimizations all works fine. Unfortunately, speed also significantly decreases.

Core of that fault was in incorrect transferring the 'carry' during computation of limbs in mbedtls_ecp_mod_p256_raw. I don't know the exact reason of this behavior, but there are cases where the processor loses 32 MSB's of int64_t.

So I've wrote alternative version of this reduction modulo P that exploits the same math, but uses nested loops and table of indexes instead of hand-written script (INIT, ADD, SUB, LAST and so on). New function (named mbedtls_ecp_mod_p256_raw_alt) works fine and fast on TI's C6000 core.

I put it in a gist here - https://gist.github.com/chemandante/e0f54eea2ce62154cfc18866759d1ba0

Please, tell me, should I make a commit with it through a pull request, or there are no needs in that func.

Or I can write alternative highly optimized version especially for TI TMS320C66x DSP core (guarded by #ifdef's)

gilles-peskine-arm commented 1 year ago

So basically a mini state machine that runs a specialized byte code. I actually tried that a while ago to see if we could save code size. Unfortunately it hurt performance significantly, and I didn't have an idea of how to fix that, so I didn't take it further.

gilles-peskine-arm commented 1 year ago

Since the goal here is to work around a compiler bug on a relatively “exotic” platform, I don't think we'll accept a non-negligible impact on code size or performance on other platforms.

minosgalanakis commented 1 year ago

That being said, raising a PR as a method of opening a design discussion, is the best suited approach. As @gilles-peskine-arm it is not guaranteed that it will be merged. If you do please ping me and I will label it appropriately.

If that is not a lot of trouble would you mind sharing the version of TI C6000 compiler that you are seeing that issue with? And would it be possible to check if that behaviour occurs with the latest version? I would like to add a comment at least to notify other potential community users.

chemandante commented 1 year ago

Thank you for answers. TI C6000 CGT version 8.3.12. Here is the link to downloads. Of course, I will try newer versions as soon as they appear. So, I think this issue can be closed at this time as 'not planned'

Yu0224 commented 1 month ago

I found the root of the problem - I think, it was a some kind of compiler fault. When I disable all optimizations all works fine. Unfortunately, speed also significantly decreases.

Core of that fault was in incorrect transferring the 'carry' during computation of limbs in mbedtls_ecp_mod_p256_raw. I don't know the exact reason of this behavior, but there are cases where the processor loses 32 MSB's of int64_t.

So I've wrote alternative version of this reduction modulo P that exploits the same math, but uses nested loops and table of indexes instead of hand-written script (INIT, ADD, SUB, LAST and so on). New function (named mbedtls_ecp_mod_p256_raw_alt) works fine and fast on TI's C6000 core.

I put it in a gist here - https://gist.github.com/chemandante/e0f54eea2ce62154cfc18866759d1ba0

Please, tell me, should I make a commit with it through a pull request, or there are no needs in that func.

Or I can write alternative highly optimized version especially for TI TMS320C66x DSP core (guarded by #ifdef's)

Thanks for sharing your problem, I am facing the same problem in embedded system. Debugging tls in embedded system can't turn on the optimization level, otherwise it will lead to problematic secret key calculation.

minosgalanakis commented 1 month ago

@Yu0224 could you please share some information on the compiler and platform you are using?

Also could you confirm if the patch created by @chemandante works for you?

Yu0224 commented 1 month ago

@Yu0224 could you please share some information on the compiler and platform you are using?

Also could you confirm if the patch created by @chemandante works for you?

@minosgalanakis The HighTec compiler is used and the embedded platform is the MCU: Infineon TC3xx.

The problem I'm having doesn't require a code patch, just turning off the code optimization in the compiler itself.

The strange thing is: with the same compiler, I didn't encounter this problem when porting mbedtls-2.28.8, but I encountered this problem when porting mbedtls-3.36.x.

Mbed-TLS / mbedtls