ElementsProject / libwally-core

Useful primitives for wallets
Other
281 stars 135 forks source link

si_signo=SIGBUS si_code=BUS_ADRALN with linux kernel 6.1.21-v8+ aarch64 #396

Closed jmastr closed 1 year ago

jmastr commented 1 year ago

Hi there,

after upgrading my Rasbian Bullseye to latest stable, I get a reproducible Bus error, when running:

$ make check

on a fresh checkout/autogen.sh/configure:

$ git clone https://github.com/ElementsProject/libwally-core.git -b release_0.8.6
$ cd libwally-core/
$ git submodule update --init --recursive
$ ./tools/autogen.sh
$ ./configure --enable-debug --enable-export-all --enable-swig-python
$ make
$ make  check                                                                   

Took me two days to figure out that this happened due to upgrading linux-libc-dev.

I downgraded:

To version 1.20230106-1_armhf and make check works again.

jgriffiths commented 1 year ago

Hi, thanks for reporting!

If you run make check V=1 it should tell you which check is causing the bus error. If possible could you run that check under gdb and report the stack trace when it occurs?

Also could you try with the current master or at least v0.8.9?

jmastr commented 1 year ago

Sure.

not working setup:

$ dpkg -l|grep 1.20230405-1
ii  linux-libc-dev:armhf                 1:1.20230405-1                   armhf        Linux support headers for userspace development
ii  raspberrypi-bootloader               1:1.20230405-1                   armhf        Raspberry Pi bootloader
ii  raspberrypi-kernel                   1:1.20230405-1                   armhf        Raspberry Pi bootloader
ii  vcdbg                                1:1.20230405-1                   armhf        Tool for debugging VideoCore
$ uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
jgriffiths commented 1 year ago

Great, thanks! looks like the descriptor test is failing with some kind of alignment issue. From your wally source directory, you can run the test under gdb using LD_LIBRARY_PATH=./src/.libs/ gdb src/.libs/test_descriptor. If you run it under gdb, can you paste the where output when it crashes?

jmastr commented 1 year ago
(gdb) run
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGBUS, Bus error.
0xf7dfb518 in ripemd160_done (ctx=0xfffef170, ctx@entry=0xf7dfb580 <ripemd160+60>, res=res@entry=0x24193) at ./ccan/ccan/endian/endian.h:237
237             return CPU_TO_LE32(native);
jmastr commented 1 year ago
(gdb) where
#0  0xf7dfb518 in ripemd160_done (ctx=0xfffef170, ctx@entry=0xf7dfb580 <ripemd160+60>, res=res@entry=0x24193) at ./ccan/ccan/endian/endian.h:237
#1  0xf7dfb580 in ripemd160 (ripemd=ripemd@entry=0x24193, p=0xfffef170, p@entry=0xfffef1d0, size=2096, size@entry=32) at ccan/ccan/crypto/ripemd160/ripemd160.c:346
#2  0xf7de0c40 in wally_hash160 (bytes=bytes@entry=0xfffef215 "D\002", bytes_len=<optimized out>, bytes_out=bytes_out@entry=0x24193 "\367\330\327\331", <incomplete sequence \367>, len=len@entry=20) at internal.c:305
#3  0xf7ddedd4 in generate_pk_h (written=0xfffef338, script_len=520, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, node=0x24430, ctx=<optimized out>) at descriptor.c:1076
#4  generate_pkh (ctx=<optimized out>, node=0x24430, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, script_len=520, written=0xfffef338, written@entry=0xfffef330) at descriptor.c:1138
#5  0xf7ddc130 in generate_script (ctx=ctx@entry=0xfffef40c, node=node@entry=0x24430, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, script@entry=0x0, script_len=520, script_len@entry=0, written=written@entry=0xfffef490) at descriptor.c:1796
#6  0xf7ddf78c in node_generate_script (written=0xfffef490, len=0, bytes_out=0x0, index=0, depth=<optimized out>, ctx=0xfffef404) at descriptor.c:2426
#7  wally_descriptor_to_script (descriptor=0x243a0, depth=<optimized out>, index=0, variant=0, multi_index=<optimized out>, multi_index@entry=0, child_num=<optimized out>, child_num@entry=0, flags=<optimized out>, flags@entry=0, bytes_out=<optimized out>, bytes_out@entry=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, len=<optimized out>, 
    len@entry=520, written=<optimized out>, written@entry=0xfffef488) at descriptor.c:2518
#8  0x0001085c in check_descriptor_to_script (test=<optimized out>) at ctest/test_descriptor.c:1739
#9  main () at ctest/test_descriptor.c:1831
jgriffiths commented 1 year ago

Awesome, thanks. Can you confirm https://github.com/ElementsProject/libwally-core/pull/397 fixes this?

jmastr commented 1 year ago

No, unfortunately same error:

(gdb) run
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGBUS, Bus error.
0xf7dfb510 in ripemd160_done (ctx=0xfffef178, ctx@entry=0xf7dfb578 <ripemd160+60>, res=res@entry=0x24193) at ./ccan/ccan/endian/endian.h:237
237             return CPU_TO_LE32(native);
(gdb) where
#0  0xf7dfb510 in ripemd160_done (ctx=0xfffef178, ctx@entry=0xf7dfb578 <ripemd160+60>, res=res@entry=0x24193) at ./ccan/ccan/endian/endian.h:237
#1  0xf7dfb578 in ripemd160 (ripemd=ripemd@entry=0x24193, p=0xfffef178, p@entry=0xfffef1d8, size=2096, size@entry=32) at ccan/ccan/crypto/ripemd160/ripemd160.c:346
#2  0xf7de0c3c in wally_hash160 (
    bytes=bytes@entry=0xfffef214 "\317)\361\270w\377\064!\002\306\004\177\224A\355}m0E@n\225\300|\330\\w\216K\214\357<\247\253\254\t\271\\p\236\345\276VEhc\t\306\346sm\275\223\224\a\ȃC\323\317)\361\270w\377\064\016,\262\322Y\317d\374\367\060D\002", 
    bytes_len=<optimized out>, bytes_out=bytes_out@entry=0x24193 "\367\330\327\331", <incomplete sequence \367>, len=len@entry=20) at internal.c:305
#3  0xf7ddedd0 in generate_pk_h (written=0xfffef338, script_len=520, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, node=0x24430, ctx=<optimized out>) at descriptor.c:1077
#4  generate_pkh (ctx=<optimized out>, node=0x24430, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, script_len=520, written=0xfffef338, written@entry=0xfffef330) at descriptor.c:1139
#5  0xf7ddc130 in generate_script (ctx=ctx@entry=0xfffef40c, node=node@entry=0x24430, script=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, script@entry=0x0, script_len=520, script_len@entry=0, written=written@entry=0xfffef490)
    at descriptor.c:1797
#6  0xf7ddf788 in node_generate_script (written=0xfffef490, len=0, bytes_out=0x0, index=0, depth=<optimized out>, ctx=0xfffef404) at descriptor.c:2427
#7  wally_descriptor_to_script (descriptor=0x243a0, depth=<optimized out>, index=0, variant=0, multi_index=<optimized out>, multi_index@entry=0, child_num=<optimized out>, child_num@entry=0, flags=<optimized out>, flags@entry=0, 
    bytes_out=<optimized out>, bytes_out@entry=0x24190 "v\251\024\367\330\327\331", <incomplete sequence \367>, len=<optimized out>, len@entry=520, written=<optimized out>, written@entry=0xfffef488) at descriptor.c:2519
#8  0x0001085c in check_descriptor_to_script (test=<optimized out>) at ctest/test_descriptor.c:1739
#9  main () at ctest/test_descriptor.c:1831
jmastr commented 1 year ago
$ git status 
On branch descriptor_align
Your branch is up to date with 'origin/descriptor_align'.

nothing to commit, working tree clean
jgriffiths commented 1 year ago

hmm, what does grep HAVE_UNALIGNED_ACCESS src/config.h show you?

jmastr commented 1 year ago
$ grep HAVE_UNALIGNED_ACCESS src/config.h
#define HAVE_UNALIGNED_ACCESS 1
jgriffiths commented 1 year ago

Thanks. seems like configure has incorrectly assumed you can access unaligned memory when in fact you cant. The configure check for that is:

int main(void){static int a[2];return *((int*)(((char*)a)+1)) != 0;}

If you compile and run that: gcc <cflags> foo.c && ./a.out does it crash?

jmastr commented 1 year ago

which <cflags> shall I set? A plain:

$ gcc foo.c && ./a.out
$ echo $?
0

generated no output, so no crash.

jmastr commented 1 year ago

I mean, with kernel 5.15.x it works on the Pi, with 6.1.y it does not. Maybe a bug in the kernel? Just a wild guess...

jgriffiths commented 1 year ago

Hmm, whether or not the CPU can access words on unaligned boundaries is a CPU feature, and shouldn't be affected by the kernel (unless its configurable on arm64?).

edit: It turns out is is configurable on later arm arches

Can you try running c-lightnings configurator check for unaligned access?

#include <string.h>
int main(int argc, char *argv[]) {
    char pad[sizeof(int *) * 1];
    memcpy(pad, argv[0], sizeof(pad));
    int *x = (int *)pad, *y = (int *)(pad + 1);
    return *x == *y;
}
jgriffiths commented 1 year ago

The other interesting question is whether HAVE_UNALIGNED_ACCESS gets set correctly on the other kernel version, and whether the compiled a.out crashes on the other kernel too.

jmastr commented 1 year ago

5.15.84:

$ cat foo.c 
#include <string.h>
int main(int argc, char *argv[]) {
    char pad[sizeof(int *) * 1];
    memcpy(pad, argv[0], sizeof(pad));
    int *x = (int *)pad, *y = (int *)(pad + 1);
    return *x == *y;
}
$ gcc foo.c && ./a.out 
$ echo $?
0

6.1.21:

$ uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
$ cat foo.c 
#include <string.h>
int main(int argc, char *argv[]) {
    char pad[sizeof(int *) * 1];
    memcpy(pad, argv[0], sizeof(pad));
    int *x = (int *)pad, *y = (int *)(pad + 1);
    return *x == *y;
}
$ gcc foo.c && ./a.out 
$ echo $?
0
jmastr commented 1 year ago

5.15.84:

$ grep HAVE_UNALIGNED_ACCESS src/config.h
#define HAVE_UNALIGNED_ACCESS 1
$ cat foo.c 
int main(void){static int a[2];return *((int*)(((char*)a)+1)) != 0;}
$ gcc foo.c && ./a.out 
$ echo $?
0
jgriffiths commented 1 year ago

Thanks, I'll take this away for a bit and do some local investigation. It seems you do have unaligned access yet you are still able to get a bus error out of this code, very strange.

jgriffiths commented 1 year ago

Hi @jmastr some more things to try:

1) Do the tests pass if you pass --enable-debug when configuring?

2) The following diff will disable aligned access, I'm interested to know if this fixes your crash. Please apply it on top of the updated https://github.com/ElementsProject/libwally-core/pull/397.

diff --git a/src/ccan_config.h b/src/ccan_config.h
index d74aa5d5..86c0a6f6 100644
--- a/src/ccan_config.h
+++ b/src/ccan_config.h
@@ -29,7 +29,7 @@
 #endif

 #if HAVE_UNALIGNED_ACCESS
-#define alignment_ok(p, n) 1
+#define alignment_ok(p, n) 0
 #else
 #define alignment_ok(p, n) ((size_t)(p) % (n) == 0)
 #endif

Also I'm wondering if your compiler version is different between the 2 kernel versions?

I am assuming that whats happening is the architecture supports unaligned loads and stores e.g. moving a value to/from unaligned addresses ands register. In hash160, the load that is crashing is used to convert the endianness of the result, using a macro which a smart compiler could easily convert to a byteswap cpu instruction.

However https://developer.arm.com/documentation/ddi0290/g/unaligned-and-mixed-endian-data-access-support/unaligned-access-support/armv6-unaligned-data-access-restrictions states "Swap and synchronization primitives, multiple-word or coprocessor access produce an alignment fault regardless of the setting of the A bit". If the compiler has swapped the instruction and its called on an unaligned address then this would explain the fault.

If that is the cause, then the fix will be to undefine HAVE_UNALIGNED_ACCESS for arm I think.

jmastr commented 1 year ago

Hey @jgriffiths

  1. Please have a look at my first comment. I currently use for my builds:

    $ ./configure --enable-debug --enable-export-all --enable-swig-python

    so no they don't. They do if I remove --enable-swig-python though...

  2. Change of src/ccan_config.h on top of origin/descriptor_align:

    $ git diff origin/master 
    diff --git a/src/ccan_config.h b/src/ccan_config.h
    index d74aa5d..86c0a6f 100644
    --- a/src/ccan_config.h
    +++ b/src/ccan_config.h
    @@ -29,7 +29,7 @@
    #endif
    
    #if HAVE_UNALIGNED_ACCESS
    -#define alignment_ok(p, n) 1
    +#define alignment_ok(p, n) 0
    #else
    #define alignment_ok(p, n) ((size_t)(p) % (n) == 0)
    #endif
    diff --git a/src/descriptor.c b/src/descriptor.c
    index a79c5c4..3a1a2f0 100644
    --- a/src/descriptor.c
    +++ b/src/descriptor.c
    @@ -1062,18 +1062,19 @@ static int generate_pk_k(ms_ctx *ctx, ms_node *node,
    static int generate_pk_h(ms_ctx *ctx, ms_node *node,
                          unsigned char *script, size_t script_len, size_t *written)
    {
    -    unsigned char buff[1 + EC_PUBLIC_KEY_UNCOMPRESSED_LEN];
    +    /* Note 4 instead of 1 here to align the data to hash to 32 bits */
    +    unsigned char buff[4 + EC_PUBLIC_KEY_UNCOMPRESSED_LEN];
     int ret = WALLY_OK;
    
     if (script_len >= WALLY_SCRIPTPUBKEY_P2PKH_LEN - 1) {
    -        ret = generate_pk_k(ctx, node, buff, sizeof(buff), written);
    +        ret = generate_pk_k(ctx, node, buff+3, sizeof(buff)-3, written);
         if (ret == WALLY_OK) {
             if (node->child->flags & NF_IS_XONLY)
                 return WALLY_EINVAL;
             script[0] = OP_DUP;
             script[1] = OP_HASH160;
             script[2] = HASH160_LEN;
    -            ret = wally_hash160(&buff[1], *written - 1, script + 3, HASH160_LEN);
    +            ret = wally_hash160(&buff[4], *written - 1, script + 3, HASH160_LEN);
             script[3 + HASH160_LEN] = OP_EQUALVERIFY;
         }
     }
    
    $ uname -a
    Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
    $ make check
    ...
    OK
    .....
    ----------------------------------------------------------------------
    Ran 5 tests in 0.002s

OK make[3]: Leaving directory '/home/rddl/tmp/libwally-core/src' make[2]: Leaving directory '/home/rddl/tmp/libwally-core/src' make[1]: Leaving directory '/home/rddl/tmp/libwally-core/src' make[1]: Entering directory '/home/rddl/tmp/libwally-core' make[1]: Nothing to be done for 'check-am'. make[1]: Leaving directory '/home/rddl/tmp/libwally-core'


That worked! :heavy_check_mark: 

3. Compiler versions:

$ uname -a Linux raspberrypi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux $ gcc --version gcc (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and:

$ uname -a Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux $ gcc --version gcc (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



The thing that I just noticed. The `5.15.84-v7l+` recognizes the processor as `armv7l`, which is 32 bit. After the upgrade to `6.1.21-v8+` it gets recognized as `aarch64`, which is 64 bit. It is the same physical cpu.
jmastr commented 1 year ago

The tests also pass by directly applying either:

#if HAVE_UNALIGNED_ACCESS
#define alignment_ok(p, n) 0
#endif

or:

#if HAVE_UNALIGNED_ACCESS
#define alignment_ok(p, n) ((size_t)(p) % (n) == 0)
#endif

to master. So from my side there is no need for:

commit 7510fe33cc403ec3b950558cdd00b48305b27710
Author: Jon Griffiths <jon_p_griffiths@yahoo.com>
Date:   Fri May 19 23:36:48 2023 +1200

    descriptor: align pk_h data for hash160

    Fixes a crash on platforms which dont support unaligned access. Thanks
    @jmastr for the bug report.

But I also don't know if it is necessary somewhere else.

jgriffiths commented 1 year ago

@jmastr Many thanks for helping to debug this.

As suspected, arm64 unaligned access support is more limited than wally requires given the compiler is free to substitute simple loads and stores with more complex instructions. I've updated https://github.com/ElementsProject/libwally-core/pull/397 to cater for this, would you mind testing it?

For coverage purposes could you add --enable-elements to your build/test run, and try with and without --enable-debug as well? Thanks!

The thing that I just noticed. The 5.15.84-v7l+ recognizes the processor as armv7l, which is 32 bit. After the upgrade to 6.1.21-v8+ it gets recognized as aarch64, which is 64 bit. It is the same physical cpu.

This seems like an odd change to happen just by a kernel upgrade, given that the processor ends up configured differently. If wally built on the old kernel runs without crashing, and then crashes after the kernel is upgraded, I don't see how that can be seen as anything other than a (typically verboten) user space regression.

jmastr commented 1 year ago

@jgriffiths Almost, with:

$ ./configure --enable-debug --enable-export-all --enable-swig-python --enable-elements
$ make check
...
...
PASS: test_bech32
PASS: test_psbt
PASS: test_psbt_limits
PASS: test_clear
PASS: test_coinselection
PASS: test_tx
../tools/build-aux/test-driver: line 109:  8305 Bus error               "$@" > $log_file 2>&1
FAIL: test_descriptor
PASS: test_elements_tx
============================================================================
Testsuite summary for libwallycore 0.8.9
============================================================================
# TOTAL: 8
# PASS:  7
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See src/test-suite.log
============================================================================
make[4]: *** [Makefile:1867: test-suite.log] Error 1
make[4]: Leaving directory '/home/rddl/tmp/libwally-core/src'
make[3]: *** [Makefile:1975: check-TESTS] Error 2
make[3]: Leaving directory '/home/rddl/tmp/libwally-core/src'
make[2]: *** [Makefile:2123: check-am] Error 2
make[2]: Leaving directory '/home/rddl/tmp/libwally-core/src'
make[1]: *** [Makefile:1759: check-recursive] Error 1
make[1]: Leaving directory '/home/rddl/tmp/libwally-core/src'
make: *** [Makefile:433: check-recursive] Error 1

Now in test-driver

jmastr commented 1 year ago

with $ ./configure --enable-export-all --enable-swig-python --enable-elements it looks basically as above.

jgriffiths commented 1 year ago

@jmastr Thanks, thats progress. Do you mind using gdb on test_tx to determine where the bus error is coming from there? Same procedure as test_descriptor previously.

jmastr commented 1 year ago

FAIL: test_descriptor

It's in test_descriptor again

It's simpler than that. On a fresh checkout with configure and everything I get:

$ grep HAVE_UNALIGNED_ACCESS src/config.h
#define HAVE_UNALIGNED_ACCESS 1

So the:

#if defined(HAVE_UNALIGNED_ACCESS) && defined(__aarch64__)
#undef HAVE_UNALIGNED_ACCESS
#define HAVE_UNALIGNED_ACCESS 0
#endif

does not get called properly.

jgriffiths commented 1 year ago

hmm, __aarch64__ can't be the correct define for you.

Can you run echo | gcc -dM -E - and paste the output here? If you have clang then running the same command for clang would also be helpful, thanks!

jmastr commented 1 year ago
$ echo | gcc -dM -E - | sort > gcc.dMe.txt

gcc.dMe.txt

jgriffiths commented 1 year ago

Thanks, your compiler is building for 32 bit arm rather than 64, so I've swapped the check to 32 bit. Can you please try again with the updated PR?

jmastr commented 1 year ago

@jgriffiths works with and without enable-debug and on both 5.15.84-v7l+ and 6.1.21-v8+ :+1:

Thanks for your effort!

jgriffiths commented 1 year ago

@jmastr Excellent! Many thank for reporting and helping to debug this issue. I've merged the fix and will be making a new release shortly.

Rather than backporting the fix I suggest you stay on master/use the latest release once its made, as there are a large number of bug fixes and some security fixes made since the 0.8.6 version that you were initially testing.