memcpy() warnings and constant time

osresearch commented 4 years ago

It looks like the modp256k1.c and modp256r1.c use memcpy(tmp, tmp, ...) to try to maintain a constant time implementation. This causes a warning with newer arm-none-eabi-gcc (I'm testing with 9.2.1 from the Ubuntu 20.04 repo):

modp256k1.c: In function 'modp256k1_add':
modp256k1.c:79:13: warning: passing argument 1 to restrict-qualified parameter aliases with argument 2 [-Wrestrict]
   79 |     memcpy (tmp, tmp, sizeof (bn256));
      |             ^~~  ~~~

The standard does not guarantee what will happen in this case:

DESCRIPTION The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.

Many libraries will check for overlap and fall back to memove(), which can short-circuit in the case of the src == dst. This would prevent the functions from actually being constant time.

szszszsz commented 4 years ago

Hi! Good catch! The solution should be easy enough as far as I see - just introduce another buffer to copy operation to both cases, which would force to always copy data. This would increase stack use by 32 bytes, which is affordable.

osresearch commented 4 years ago

And, nope, my pull request doesn't fix it. gcc is too smart for us.

void
modp256k1_add (bn256 *X, const bn256 *A, const bn256 *B)
{
  uint32_t cond;
  bn256 tmp[1];

  cond = (bn256_add (X, A, B) == 0);
  cond &= bn256_sub (tmp, X, P256K1);
  if (cond)
    /* No-carry AND borrow */
    memcpy (tmp, X, sizeof (bn256)); // constant time
  else
    memcpy (X, tmp, sizeof (bn256));
}

becomes:

00000000 <modp256k1_add>:
   0:   b530            push    {r4, r5, lr}
   2:   b089            sub     sp, #36 ; 0x24
   4:   4604            mov     r4, r0
   6:   f7ff fffe       bl      0 <bn256_add>
   a:   4621            mov     r1, r4
   c:   fab0 f580       clz     r5, r0
  10:   4a0a            ldr     r2, [pc, #40]   ; (3c <modp256k1_add+0x3c>)
  12:   4668            mov     r0, sp
  14:   f7ff fffe       bl      0 <bn256_sub>
  18:   096d            lsrs    r5, r5, #5
  1a:   4205            tst     r5, r0  // if (cond)
  1c:   d10b            bne.n   36 <modp256k1_add+0x36> // goto early return without mempcy
  1e:   466a            mov     r2, sp // else actually do a memcpy
  20:   4623            mov     r3, r4
  22:   ad08            add     r5, sp, #32
  24:   4614            mov     r4, r2
  26:   cc03            ldmia   r4!, {r0, r1}
  28:   42ac            cmp     r4, r5
  2a:   6018            str     r0, [r3, #0]
  2c:   6059            str     r1, [r3, #4]
  2e:   4622            mov     r2, r4
  30:   f103 0308       add.w   r3, r3, #8
  34:   d1f6            bne.n   24 <modp256k1_add+0x24>
  36:   b009            add     sp, #36 ; 0x24
  38:   bd30            pop     {r4, r5, pc}
  3a:   bf00            nop
  3c:   00000000        .word   0x00000000

Since tmp doesn't live on, the entire mempcy() is discarded.

~~When in doubt, add more volatile?~~ gcc sees through the volatile tricks, too.

szszszsz commented 4 years ago

How about this one below? Looks like -Os does not strip it away. Obviously less performant due to the additional copy. In the example I might have used wrong compiler type, but should work in principle.

We need to use own copying routines, otherwise the volatility is stripped away.

Executed in Compiler Explorer:

https://godbolt.org/z/br5h4W (cleaned up)
https://godbolt.org/z/YcjnG7 (initial)

Something like this:

void nk_memcpy_bn256(volatile bn256 *dest, volatile bn256 *src){
    for(uint8_t i=0;i<BN256_WORDS;i++){
        dest->word[i] = src->word[i];
    }
}

void
modp256k1_add3 (bn256 *X, const bn256 *A, const bn256 *B)
{
  uint32_t cond;
  volatile bn256 tmp[1];
  volatile bn256 tmp2[1];

  cond = (bn256_add (X, A, B) == 0);
  cond &= bn256_sub (tmp, X, P256K1);
  if (cond){
    /* No-carry AND borrow */
    nk_memcpy_bn256 (tmp2, tmp);
    nk_memcpy_bn256 (tmp, tmp2);
    }
  else{
    nk_memcpy_bn256 (tmp2, tmp);
    nk_memcpy_bn256 (X, tmp2);
    }
}

osresearch commented 4 years ago

This gets closer to constant time than before (the memory copy always happens), although there are two extra instructions on one path: https://github.com/Nitrokey/nitrokey-start-firmware/pull/54/commits/64d695726576e7bb5ebbc75033024283e52fc16b

szszszsz commented 4 years ago

Could this be treated like POD? I missed that, interesting. Regarding your implementation, how about this change? (updated)

void bn256_copy_if (int do_it, bn256 *dest, bn256 *src)
{
    static volatile bn256 fake_dest[1];

    if (do_it)
        *dest = *src;
    else
        *fake_dest = *src;
}

Edit: https://godbolt.org/z/ozxY9b - looks like temp variable cannot be on function-local stack, otherwise its handled differently. Edit: https://godbolt.org/z/oab597 - this one looks promising

Nitrokey / nitrokey-start-firmware

memcpy() warnings and constant time #53