Mbed-TLS / mbedtls

An open source, portable, easy to use, readable and flexible TLS library, and reference implementation of the PSA Cryptography API. Releases are on a varying cadence, typically around 3 - 6 months between releases.
https://www.trustedfirmware.org/projects/mbed-tls/
Other
5.25k stars 2.56k forks source link

Support unaligned access in Armv8-A accelerated SHA2 #8447

Open tom-cosgrove-arm opened 11 months ago

tom-cosgrove-arm commented 11 months ago

When alignment check is enabled on AArch64 (e.g. in TF-A RMM) we see mbedtls_internal_sha256_process_many_a64_crypto() generates alignment fault exception for instruction

"ldr    q31, [x0]" @offset 18a24:

static size_t mbedtls_internal_sha256_process_many_a64_crypto(
    mbedtls_sha256_context *ctx, const uint8_t *msg, size_t len)
{
   18a00:   d503245f    bti c
   18a04:   d11e43ff    sub sp, sp, #0x790
   18a08:   f9000fe0    str x0, [sp, #24]
   18a0c:   f9000be1    str x1, [sp, #16]
   18a10:   f90007e2    str x2, [sp, #8]
    uint32x4_t abcd = vld1q_u32(&ctx->state[0]);
   18a14:   f9400fe0    ldr x0, [sp, #24]
   18a18:   91002000    add x0, x0, #0x8
   18a1c:   f90363e0    str x0, [sp, #1728]
__extension__ extern __inline uint32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vld1q_u32 (const uint32_t *__a)
{
  return __builtin_aarch64_ld1v4si_us (
   18a20:   f94363e0    ldr x0, [sp, #1728]
   18a24:   3dc0001f    ldr q31, [x0]

because X0 = 0x00000000fee2bb78 is not 16 bytes aligned. This is because of

18a18:  91002000    add x0, x0, #0x8

which is defined by "state" offset of 8:

typedef struct mbedtls_sha256_context {
    uint32_t MBEDTLS_PRIVATE(total)[2];          /*!< The number of Bytes processed.  */
    uint32_t MBEDTLS_PRIVATE(state)[8];          /*!< The intermediate digest state.  */
    unsigned char MBEDTLS_PRIVATE(buffer)[64];   /*!< The data block being processed. */
    int MBEDTLS_PRIVATE(is224);                  /*!< Determines which function to use:
                                                    0: Use SHA-256, or 1: Use SHA-224. */
} mbedtls_sha256_context;

To get this to work required aligning state and all variables of type uint32x4_t type in mbedtls_internal_sha256_process_many_a64_crypto() & defining aligned array to copy msg into it:

    uint8_t mes [4 * sizeof(uint32x4_t)] __attribute__ ((aligned (16)));
    (void)memcpy(mes, msg, 64);
        uint32x4_t sched0 __attribute__ ((aligned (16))) = (uint32x4_t) vld1q_u8(mes + 16 * 0);
        uint32x4_t sched1 __attribute__ ((aligned (16))) = (uint32x4_t) vld1q_u8(mes + 16 * 1);
        uint32x4_t sched2 __attribute__ ((aligned (16))) = (uint32x4_t) vld1q_u8(mes + 16 * 2);
        uint32x4_t sched3 __attribute__ ((aligned (16))) = (uint32x4_t) vld1q_u8(mes + 16 * 3);
AlexeiFedorov commented 11 months ago

It seems that GCC doesn't align variables of uint32x4_t type. If they are replaced with __uint128_t, then alignment will be set automatically.

daverodgman commented 11 months ago

https://gcc.gnu.org/bugzilla//show_bug.cgi?id=111555 looks relevant.