Open chemandante opened 1 year ago
Hi, and thank you for your contribution.
We are currently into the proccess of migrating the bignum interface. While I am looking into this issue could you please try to test it on the development branch, which has PR7230 merged?
Hi! Exactly the same (i.e. incorrect) result after using commit https://github.com/Mbed-TLS/mbedtls/commit/fffd6d9ded2f2a57cedd1b97268af706c4a5f730
So following up on this.
I have pushed a simple test program which is meant to test the output using this pair and I have so far failed to reproduce it.
The idea is the following:
As you have confirmed and as I also noticed the multiplication of X * X
is correct. So when the issue occurs that happens in the modural reduction stage.
I am assuming that in your configuration the grp->modp == NULL
so the mbedtls_mpi_mod_mpi
is called. But for sanity reasons I have tested the reduction method, when grp->modp != NULL
Could you please confirm the following:
grp->modp
pointing at null?Test mbedtls_mpi_mul_mpi #99 Issue7658 Data
fails mbedtls_mpi_mul_mod
with mbedtls_ecp_mod_p256_raw
?Tip. You can only run this test instead of the whole suite using:
mkdir test_dir && cd test_dir
cmake ../
make -j8 && tests/test_suite_bignum.misc "--verbose" tests/test_suite_bignum.misc.datax
First of all, Everything I wrote below I tried on your commit (https://github.com/Mbed-TLS/mbedtls/commit/9cb43045de8ae4f3239491548db0998cab1c0565)
MBEDTLS_ECP_NIST_OPTIM
there is no any optimization, grp->modp == NULL
, so the mbedtls_mpi_mod_mpi
is called and the result is correct.MBEDTLS_ECP_NIST_OPTIM
grp->modp
points to ecp_mod_p256
and result is wrong.mbedtls_mpi_mul_mod
-> ecp_modp
(through define MOD_MUL
) -> ecp_mod_p256
(through grp->modp
) -> mbedtls_ecp_mod_p256_raw
.ecp_mod_p256
X * X
is correct, but after this call X * X mod P
became wrong. I check this using own mbedtls_printf
to write values of mpi
before and after call to ecp_mod_p256
.Perhaps the different behavior is due to the fact that I use a different processor (but this is a 32-bit DSP and I use verified native compiler)
@chemandante Would you mind sharing why you can't run the test suite? Is it because you can't pass the .datax
file, or are there other problems? We'd like to make it possible to run the test suites on all platforms, but we don't know exactly what the problems are on various platforms.
@gilles-peskine-arm I use embedded system on own-developed mainboard with TI C6000-series processor on it with TI's own compiler and realtime OS (SYS/BIOS, https://www.ti.com/tool/SYSBIOS). There is no way to run make/cmake, even more Python or Perl script. Maybe I'm certainly wrong.
But I can run mbedtls_ecp_self_test
So, I run mbedtls_ecp_self_test
with 'first available curve' that is given by mbedtls_ecp_curve_list
(which in my case is SECP521R1) the test runs smoothly. Argument verbose == 1
during call to test. The result in console is:
ECP SW test #1 (constant op_count, base point G):
passed
ECP SW test #2 (constant op_count, other point):
passed
Then I force the curve in test to be SECP256R1. Default behavior of test is
MBEDTLS_MPI_CHK(mbedtls_ecp_group_load(&grp, mbedtls_ecp_curve_list()->grp_id));
I change it to:
MBEDTLS_MPI_CHK(mbedtls_ecp_group_load(&grp, MBEDTLS_ECP_DP_SECP256R1));
and I get:
ECP SW test #1 (constant op_count, base point G):
Unexpected error, return code = FFFFB380
This error is "invalid public or private key", which is actually caused by the subject of this issue. Error given by call tree mbedtls_ecp_mul
-> mbedtls_ecp_mul_restartable
-> ecp_mul_restartable_internal
-> mbedtls_ecp_check_pubkey
-> ecp_check_pubkey_sw
. In ecp_check_pubkey_sw
variables YY
and RHS
have different values and that gives MBEDTLS_ERR_ECP_INVALID_KEY
error.
I found exact location of the problem using __LINE__
macro.
Thanks for coming back. It helps to try things on your setup since we cannot always reproduce every enviroment.
I have tested the mbedtls_ecp_mod_p256_raw
using our python framework and the suspect N, P pair and it did not seem to be an issue. I will however push a new commit on the branch that shows how to do that just in case.
While I am looking at it more closely, could you please try the following on your hardware:
mbedtls_ecp_mod_p256_raw
reduction.With the new interface we are moving to fixed width, and performing extra rounds of subtractions in each curves function if needed ( depending on carrys ). I just want to rule out that the highlighted code does not mess with a correct result.
I've made this test in another way - I've put some printfs in different places of ecp_modp
here and got the console output. There are four N's
: N1
- before call to mbedtls_ecp_mod_p256_raw
, N2
- right after the call, N3
after first while
reduction and N4
at final. (grp->modp
points to right function in memory)
grp->modp = c228b40
N1: 18EE0D07C8DCD20B9CE0AED20040731334D2A0659B3CDAC48B9F77ED85221580F323BDCF3862CA176DBAFDA0BB50A6DA6C3354D552AEC89C16ECDB42ABD2F479 (len = 129)
N2: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
N3: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
N4: 55DF5D5850F47BAC8214913A979369FE498A9023A412B5E0BEDD2CFC21C3ED91
As it can be seen, wrong result appears right after call to mbedtls_ecp_mod_p256_raw
Can you please share your exact C code? We don't have your processor, but when we try to reproduce the problem, having your code would be less difficulty.
@gilles-peskine-arm, off course I can. I made a commit with exact version I use to get last tests. Just run mbedtls_ecp_self_test
- it gives an error after the call to mbedtls_ecp_mul
at line 3586.
Meanwhile I will try to do step by step analysis of mbedtls_ecp_mod_p256_raw
because it's obvious that this function behaves differently in our systems.
Your configuration keeps MBEDTLS_HAVE_ASM
enabled, but our code base doesn't have any assembly for your platform, right?
Yes, I have completely different ASM language. It's strange, but there are no any errors. May be because MBEDTLS_HAVE_ASM
is required by MBEDTLS_AESCE_C
, MBEDTLS_AESNI_C
and MBEDTLS_PADLOCK_C
but all of them are commented. I'll try to comment MBEDTLS_HAVE_ASM
too.
Your configuration keeps
MBEDTLS_HAVE_ASM
enabled, but our code base doesn't have any assembly for your platform, right?
This shouldn't be a problem as I understand it, since MBEDTLS_HAVE_ASM
means the compiler supports asm()
, but if we don't have assembly code for the target architecture we should fall back to a C implementation
MBEDTLS_HAVE_ASM
generically enables support for asm
, but then there are specific #ifdef
for specific CPU architectures. I don't think we have any assembly code for your CPU, but it's better to check to be sure. The relevant file is bn_mul.h
, where we have assembly for intel and arm but also for some embedded platforms like microblaze and tricore.
I found the root of the problem - I think, it was a some kind of compiler fault. When I disable all optimizations all works fine. Unfortunately, speed also significantly decreases.
Core of that fault was in incorrect transferring the 'carry' during computation of limbs in mbedtls_ecp_mod_p256_raw
. I don't know the exact reason of this behavior, but there are cases where the processor loses 32 MSB's of int64_t
.
So I've wrote alternative version of this reduction modulo P that exploits the same math, but uses nested loops and table of indexes instead of hand-written script (INIT, ADD, SUB, LAST and so on). New function (named mbedtls_ecp_mod_p256_raw_alt
) works fine and fast on TI's C6000 core.
I put it in a gist here - https://gist.github.com/chemandante/e0f54eea2ce62154cfc18866759d1ba0
Please, tell me, should I make a commit with it through a pull request, or there are no needs in that func.
Or I can write alternative highly optimized version especially for TI TMS320C66x DSP core (guarded by #ifdef
's)
So basically a mini state machine that runs a specialized byte code. I actually tried that a while ago to see if we could save code size. Unfortunately it hurt performance significantly, and I didn't have an idea of how to fix that, so I didn't take it further.
Since the goal here is to work around a compiler bug on a relatively “exotic” platform, I don't think we'll accept a non-negligible impact on code size or performance on other platforms.
That being said, raising a PR as a method of opening a design discussion, is the best suited approach. As @gilles-peskine-arm it is not guaranteed that it will be merged. If you do please ping me and I will label it appropriately.
If that is not a lot of trouble would you mind sharing the version of TI C6000 compiler that you are seeing that issue with? And would it be possible to check if that behaviour occurs with the latest version? I would like to add a comment at least to notify other potential community users.
Thank you for answers. TI C6000 CGT version 8.3.12. Here is the link to downloads. Of course, I will try newer versions as soon as they appear. So, I think this issue can be closed at this time as 'not planned'
I found the root of the problem - I think, it was a some kind of compiler fault. When I disable all optimizations all works fine. Unfortunately, speed also significantly decreases.
Core of that fault was in incorrect transferring the 'carry' during computation of limbs in
mbedtls_ecp_mod_p256_raw
. I don't know the exact reason of this behavior, but there are cases where the processor loses 32 MSB's ofint64_t
.So I've wrote alternative version of this reduction modulo P that exploits the same math, but uses nested loops and table of indexes instead of hand-written script (INIT, ADD, SUB, LAST and so on). New function (named
mbedtls_ecp_mod_p256_raw_alt
) works fine and fast on TI's C6000 core.I put it in a gist here - https://gist.github.com/chemandante/e0f54eea2ce62154cfc18866759d1ba0
Please, tell me, should I make a commit with it through a pull request, or there are no needs in that func.
Or I can write alternative highly optimized version especially for TI TMS320C66x DSP core (guarded by
#ifdef
's)
Thanks for sharing your problem, I am facing the same problem in embedded system. Debugging tls in embedded system can't turn on the optimization level, otherwise it will lead to problematic secret key calculation.
@Yu0224 could you please share some information on the compiler and platform you are using?
Also could you confirm if the patch created by @chemandante works for you?
@Yu0224 could you please share some information on the compiler and platform you are using?
Also could you confirm if the patch created by @chemandante works for you?
@minosgalanakis The HighTec compiler is used and the embedded platform is the MCU: Infineon TC3xx.
The problem I'm having doesn't require a code patch, just turning off the code optimization in the compiler itself.
The strange thing is: with the same compiler, I didn't encounter this problem when porting mbedtls-2.28.8, but I encountered this problem when porting mbedtls-3.36.x.
Summary
mbedtls_mpi_mul_mod() gives incorrect result for given values.
Using curve MBEDTLS_ECP_DP_SECP256R1, I need to calculate Y = X * X mod P using
mbedtls_mpi_mul_mod()
(to be honest, this function is called frommbedtls_ecp_mul()
during call tombedtls_ecp_check_pubkey()
)My values are:
X = 0x4FE342E2FE1A7F9B8EE7EB4A7C0F9E162BCE33576B315ECECBB6406837BF51F5
P = 0xFFFFFFFF00000001000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFF
System information
Mbed TLS version (number or commit id): 3.4.0 (tagged release commit) Operating system and version: Compiler for DSP TI C6000 (TMS320C66x series processors) (https://www.ti.com/tool/C6000-CGT) Configuration (if not default, please attach
mbedtls_config.h
): mbedtls_config.zip Compiler and options (if you used a pre-built binary, please indicate how you obtained it): TI C6000 compiler Additional environment information:Expected behavior
Y = 0x55DF5D5850F47BAD82149139979369FE498A9022A412B5E0BEDD2CFC21C3ED91
Actual behavior
differs only in three digits
Steps to reproduce
Just call
mbedtls_mpi_mul_mod
with these valuesAdditional information
This problem disappears when I comment MBEDTLS_ECP_NIST_OPTIM macro in mbedTLS 3.4.0, but in mbedTLS 3.3.0 it works fine even when MBEDTLS_ECP_NIST_OPTIM remains uncommented.
I think, the root of the problem is in
mbedtls_ecp_mod_p256_raw()
, but I can't find exact solution.P.S.
X * X
calculates correctly. The problem appears after reduction modulo P.