ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

Occasional hard falt when using the mbedtls entropy fucntion in parrallel threads (K64F) #9513

Closed adamelhakham closed 5 years ago

adamelhakham commented 5 years ago

Description

I am part of the group that develops the Pelion Device Management Client. We use mbed-os with KVSTORE and in our testing we do a lot of seeding of a DRBG, which calls the mbedtls mbedtls_entropy_func function, which calls the mbedtls_hardware_poll function to generate random entropy from the K64F hardware. In addition, the Pelion device management client runs a thread that every number of seconds tries to generate random data from the mbedtls_hardware_poll API. The problem is that the mbedtls_hardware_poll API is not thread safe and we occationally get the following hard fault:

++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0   : 20008CF5
R1   : 00000005
R2   : 40029004
R3   : 00010000
R4   : 00000030
R5   : 0000001B
R6   : 20008CF5
R7   : 40029000
R8   : 20008CDC
R9   : 00000000
R10  : 00000000
R11  : 00000000
R12  : 00000010
SP   : 20008CA8
LR   : 0001CC0B
PC   : 0001CBDC
xPSR : 41000000
PSP  : 20008C88
MSP  : 2002FFB8
CPUID: 410FC241
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000082
UFSR : 00000000
DFSR : 00000008
AFSR : 00000000
BFAR : 40029004
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x16441
Error Value: 0x1CBDC
Current Thread: 20008d60  Id: 0x20008D88 Entry: 0x83CD StackSize: 0x11B8 StackMem: 0x20007B80 SP: 0x2002FF10
For more info, visit: https://armmbed.github.io/mbedos-error/?error=0x80FF013D
-- MbedOS Error Info --

After disassembling the code, program counter points to the following instruction (the disasse:

0001cbb4 <trng_get_byte>:
   1cbb4:   2100        movs    r1, #0
   1cbb6:   7001        strb    r1, [r0, #0]
   1cbb8:   f249 0204   movw    r2, #36868  ; 0x9004
   1cbbc:   f2c4 0202   movt    r2, #16386  ; 0x4002
   1cbc0:   e00c        b.n 1cbdc <trng_get_byte+0x28>
   1cbc2:   6893        ldr r3, [r2, #8]
   1cbc4:   f003 0301   and.w   r3, r3, #1
   1cbc8:   fa03 fc01   lsl.w   ip, r3, r1
   1cbcc:   7803        ldrb    r3, [r0, #0]
   1cbce:   ea43 030c   orr.w   r3, r3, ip
   1cbd2:   7003        strb    r3, [r0, #0]
   1cbd4:   3101        adds    r1, #1
   1cbd6:   2908        cmp r1, #8
   1cbd8:   bf08        it  eq
   1cbda:   4770        bxeq    lr
   *PC POINTS HERE* -> 1cbdc:   6813        ldr r3, [r2, #0]
   1cbde:   f413 4f7f   tst.w   r3, #65280  ; 0xff00
   1cbe2:   d0fb        beq.n   1cbdc <trng_get_byte+0x28>
   1cbe4:   e7ed        b.n 1cbc2 <trng_get_byte+0xe>

from looking at the following file: targets/TARGET_Freescale/TARGET_MCUXpresso_MCUS/TARGET_MCU_K64F/trng_api.c I see that the trng_get_byte() API polls the TRNG status register and when it is set, the function reads from the output register. Now, it is possible for 2 threads to read that the status register has been set and then one thread will read the output register, while the next thread will try to read from the output register while the status register has been set back to 0 (after the first thread read), which according to the K64F reference manual, will raise a hardware exception. Not sure that that is in fact what is happening but is possible.

Shouldn't the trng_get_byte function protect from that situation?

Reproducing Our issue can easily be reproduced by creating two threads that simply call mbedtls_hardware_poll(). The dump I added was from the ARMC compiler but the same happens with GCC ARM.

Issue request type

[ ] Question
[ ] Enhancement
[x ] Bug
jenia81 commented 5 years ago

@avolinski @sbutcher-arm @Patater The above issue that @adamelhakham raised is blocking Pelion Client entropy injection feature. Can someone from Device team can look on this?

ciarmcom commented 5 years ago

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-803

0xc0170 commented 5 years ago

@avolinski @sbutcher-arm @Patater

Any review? Should mbedtls_hardware_poll be protected in the application? or is this K64F issue?

Patater commented 5 years ago

Mbed OS HAL is not reentrant, IIRC. This means there is no expectation that trng_get_bytes() would be safe from multiple threads.

The Mbed TLS library provides MBEDTLS_THREADING_C, which can do some automatic synchronization, but it's limited. In general, performing more than one operation at a time on a single Mbed TLS context is not possible and those operations should be serialized by the application. For the parts of the library that are shared (like the entropy subsystem), indeed implementing MBEDTLS_THREADING_C for Mbed OS would help. This isn't implemented in Mbed OS today, so until then applications should serialize access to Mbed TLS.

CC @RonEld

kjbracey commented 5 years ago

mbedtls_hardware_poll is not an API that mbed TLS provides, but a system/porting API that it uses.

As such a couple of components have "piggybacked" onto this, by reasoning that "if this API is being provided by the system for mbed TLS, then we can use it too".

There's no realistic way we can ask applications to serialize access to Mbed TLS overall - there are too many independent pieces of code. So the shared components like this have to be protected somehow.

Given that there are "piggy-back" users aside from Mbed TLS itself already, I think the simplest thing to do is probably to ensure that mbedtls_hardware_poll uses a PlatformMutex to protect its access to trng_get_bytes. (And if there are any other alternative versions not using DEVICE_TRNG, they should also be thread-safe).

Still, MBEDTLS_THREADING_C may be worthwhile anyway - maybe there's other stuff that needs protection. But it wouldn't currently be an answer for the general "get entropy" issue, as there is no mbed TLS API to obtain random data from its entropy generators, so code does have to go around the library straight to mbedtls_hardware_poll.

adamelhakham commented 5 years ago

Given that there are "piggy-back" users aside from Mbed TLS itself already, I think the simplest thing to do is probably to ensure that mbedtls_hardware_poll uses a PlatformMutex to protect its access to trng_get_bytes. (And if there are any other alternative versions not using DEVICE_TRNG, they should also be thread-safe).

I agree. the mbedtls_hardware_poll API can be used both by mbedtls (which is used by mbed-os features such as kvstore) but can also be called directly by the application, if it needs access to the TRNG. This means that there is no way for the application to serialize application invocations of mbedtls_hardware_poll with OS invocations of mbedtls which invoke that same function. Unless maybe if mbedtls_hardware_poll is compiled with some weak symbol or something, and the application can provide its own implementation. But this does not seem reasonable to me..

kjbracey commented 5 years ago

I'll prepare a PR with this proposal.

kjbracey commented 5 years ago

One significant user of mbedtls_hardware_poll occurs in the Nanostack-derived middleware - randLIB.h is a high-quality pseudo-RNG for general non-crypto use, and it can accept randomisation being pushed in from various sources (eg radio noise, or MAC address).

And by default it will use mbedtls_hardware_poll at startup to get going. (This use predates the TRNG HAL - but we knew that if a system ever had some hardware TRNG, it must presumably be plumbed in to Mbed TLS there.)

0xc0170 commented 5 years ago

@adamelhakham Can you test PR 9532 ? it should fix this issue

kjbracey commented 5 years ago

One extra point on this - I note that the "entropy context" in Mbed TLS is not itself a shared resource, so each TLSSocketWrapper or other TLS session has its own entropy context.

That in turn means that MBEDTLS_THREADING_C would actually provide no protection here - each entropy context is independently calling mbedtls_hardware_poll with no overall system-level lock, just a per-context mutex.

As such, it does appear to be an undocumented porting requirement that mbedtls_hardware_poll be thread-safe, regardless of the setting of MBEDTLS_THREADING_C.

simonbutcher commented 5 years ago

I feel like I'm coming late to this, but as touched on above, whilst Mbed TLS supports multiple threads in its standalone form through MBEDTLS_THREADING_C, that is currently not supported in Mbed OS.

mbedtls_hardware_poll is not an API that mbed TLS provides, but a system/porting API that it uses.

As such a couple of components have "piggybacked" onto this, by reasoning that "if this API is being provided by the system for mbed TLS, then we can use it too".

That's correct, and it's really part of the porting layer. As such its on its own and has to provide its own resource management. Defining MBEDTLS_THREADING_C won't help anyone.

Given that there are "piggy-back" users aside from Mbed TLS itself already, I think the simplest thing to do is probably to ensure that mbedtls_hardware_poll uses a PlatformMutex to protect its access to trng_get_bytes. (And if there are any other alternative versions not using DEVICE_TRNG, they should also be thread-safe).

I think that's the best solution.

Still, MBEDTLS_THREADING_C may be worthwhile anyway - maybe there's other stuff that needs protection. But it wouldn't currently be an answer for the general "get entropy" issue, as there is no mbed TLS API to obtain random data from its entropy generators, so code does have to go around the library straight to mbedtls_hardware_poll.

That involves making MBEDTLS_THREADING_C supported in Mbed OS. We currently don't have time for that, and I don't think it's needed for this release or this bug (although it would be a good to add it in the near future).

adamelhakham commented 5 years ago

@0xc0170 @kjbracey-arm I tested it and our CI passes. Thanks!

adamelhakham commented 5 years ago

@0xc0170 @kjbracey-arm Will the fix be added to release 5.11.3?

cmonr commented 5 years ago

@adamelhakham 5.11.3 is about to be released. https://github.com/ARMmbed/mbed-os/pull/9532 will come in during 5.11.4