apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.74k stars 1.13k forks source link

nxflat build and run-time errors #3737

Open a-lunev opened 3 years ago

a-lunev commented 3 years ago

Hello, Unfortunately, I can not build eagle100:nxflat and eagle100:thttpd configurations. As I've understood from README files and history files in NuttX repo, there are some not yet resolved issues with new gcc versions, and gcc 4.3.3 was the last version that still worked for NXFLAT mode. I tried to build NXFLAT Toolchain based on gcc 4.3.3 and binutils 2.19.1, however I'm still experiencing NuttX build errors.

Steps to reproduce:

$ mkdir TEST_ROOT
$ git clone https://github.com/apache/incubator-nuttx.git TEST_ROOT/nuttx
$ git clone https://github.com/apache/incubator-nuttx-apps TEST_ROOT/apps
$ cd TEST_ROOT/nuttx
$ ./tools/configure.sh -l eagle100:nxflat

Build NXFLAT Toolchain:

$ git clone https://bitbucket.org/nuttx/buildroot.git TEST_ROOT/buildroot/buildroot
$ cd TEST_ROOT/buildroot/buildroot
$ cp configs/cortexm3-defconfig-nxflat .config
$ make oldconfig
$ make menuconfig
activate the following options:
Toolchain Options -> Build GCC cross-compiler
Toolchain Options -> Build C++ compiler
$ make

Build NuttX:

$ cd TEST_ROOT/nuttx
$ make CROSSDEV=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-elf- \
       MKNXFLAT=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/mknxflat \
       LDNXFLAT=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/ldnxflat

There are multiple build errors. The first portion is as follows:

make[5]: Entering directory '.../TEST_ROOT/apps/examples/nxflat/tests/errno'
CC: errno.c
LD: errno.o
MK: errno.r1
AS: errno-thunk.S
LD: errno-thunk.o
LD: errno.r2
INPUT SECTIONS:
SECT LOW      HIGH     SIZE
TEXT 00000000 0000018a 0000018a
DATA 00000000 00000028 00000028
BSS  00000028 00000028 00000000
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 00000064 to sym '.LC0' [0000010c]
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 00000068 to sym '.LC1' [00000124]
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 0000006c to sym 'g_nonexistent' [000000fc]
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 00000070 to sym '.LC2' [0000013c]
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 00000074 to sym '.LC3' [0000013e]
ERROR -- Symbol in GOT32 relocation is in TEXT
ERROR --   At addr 00000078 to sym '.LC4' [00000165]
Entry symbol "main": 00000024 in section ".text"

Could you please tell me if I'm doing something wrong or what https://bitbucket.org/nuttx/buildroot.git SHA-1 (including what gcc and binutils version) and NuttX SHA-1 are compatible to each other to make NuttX with enabled NXFLAT working?

btashton commented 3 years ago

I was actually investing some stuff related to nxflat over the weekend to try some thoughts I had on improving the share module story.

You should be able to use modern gcc as outlined here: https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=139629508#content/view/139630111

Specially including this flag -mno-pic-data-is-text-relative

I has confirmed that that flag does what we expect, but I did not go through the rest of the process.

patacongo commented 3 years ago

There is a workaround for this problem noted here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=139630111 .  The workaround is to use an newer GCC option |-mno-pic-data-is-text-relative. ||T|hat option restores the original behavior of the older GCC tools.  The README file should be updated to reflect this workaround.

This issue (but not the workaround) is discussed in https://cwiki.apache.org/confluence/display/NUTTX/NxFlat as well.

On 5/17/2021 3:43 PM, a-lunev wrote:

Hello, Unfortunately, I can not build eagle100:nxflat and eagle100:thttpd configurations. As I've understood from README files and history files in NuttX repo, there are some not yet resolved issues with new gcc versions, and gcc 4.3.3 was the last version that still worked for NXFLAT mode. I tried to build NXFLAT Toolchain based on gcc 4.3.3 and binutils 2.19.1, however I'm still experiencing NuttX build errors.

Steps to reproduce:

|$ mkdir TEST_ROOT $ git clone https://github.com/apache/incubator-nuttx.git TEST_ROOT/nuttx $ git clone https://github.com/apache/incubator-nuttx-apps TEST_ROOT/apps $ cd TEST_ROOT/nuttx $ ./tools/configure.sh -l eagle100:nxflat |

Build NXFLAT Toolchain:

|$ git clone https://bitbucket.org/nuttx/buildroot.git TEST_ROOT/buildroot/buildroot $ cd TEST_ROOT/buildroot/buildroot $ cp configs/cortexm3-defconfig-nxflat .config $ make oldconfig $ make menuconfig activate the following options: Toolchain Options -> Build GCC cross-compiler Toolchain Options -> Build C++ compiler $ make |

Build NuttX:

$ cd TEST_ROOT/nuttx $ make CROSSDEV=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-elf- \ MKNXFLAT=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/mknxflat \ LDNXFLAT=TEST_ROOT/buildroot/buildroot/build_arm_nofpu/staging_dir/bin/ldnxflat

There are multiple build errors. The first portion is as follows:

|make[5]: Entering directory '.../TEST_ROOT/apps/examples/nxflat/tests/errno' CC: errno.c LD: errno.o MK: errno.r1 AS: errno-thunk.S LD: errno-thunk.o LD: errno.r2 INPUT SECTIONS: SECT LOW HIGH SIZE TEXT 00000000 0000018a 0000018a DATA 00000000 00000028 00000028 BSS 00000028 00000028 00000000 ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 00000064 to sym '.LC0' [0000010c] ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 00000068 to sym '.LC1' [00000124] ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 0000006c to sym 'g_nonexistent' [000000fc] ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 00000070 to sym '.LC2' [0000013c] ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 00000074 to sym '.LC3' [0000013e] ERROR -- Symbol in GOT32 relocation is in TEXT ERROR -- At addr 00000078 to sym '.LC4' [00000165] Entry symbol "main": 00000024 in section ".text" |

Could you please tell me if I'm doing something wrong or what https://bitbucket.org/nuttx/buildroot.git https://bitbucket.org/nuttx/buildroot.git SHA-1 (including what gcc and binutils version) and NuttX SHA-1 are compatible to each other to make NuttX with enabled NXFLAT working?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-nuttx/issues/3737, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUG6R7R7Y7XDWVTR6M6O3TOGEXVANCNFSM45BIO7SQ.

patacongo commented 3 years ago

I was actually investing some stuff related to nxflat over the weekend to try some thoughts I had on improving the share module story.

I have already implemented full, MMU-less shared library support in a binary format that call XFLAT.  You can see that code at http://xflat.sourceforge.net/ (haven't touched that in years). The Sourceforge code is still under CVS!  There is a GIT version here: https://bitbucket.org/patacongo/xflat/src/master/

I created NxFLAT as a stripped down version of XFLAT with no shared library support but with a smaller footprint suitable for the kind of MCUs that NuttX originally target.

The objectives of NuttX have changed over the years.  Originally, it was intended to be a tiny RTOS with size comparable to the other tiny RTOSs like FreeRTOS and ChibiOS, but still supporting mostly POSIX OS interfaces.  So a lot of corners were cut in the original designs to keep the size to a minimum.  That objective has morphed over the years:  Now we aim to be small (but not tiny) Linux work-alike.  Very different concept.

btashton commented 3 years ago

I was actually investing some stuff related to nxflat over the weekend to try some thoughts I had on improving the share module story.

I have already implemented full, MMU-less shared library support in a binary format that call XFLAT.  You can see that code at http://xflat.sourceforge.net/ (haven't touched that in years). The Sourceforge code is still under CVS!  There is a GIT version here: https://bitbucket.org/patacongo/xflat/src/master/

I created NxFLAT as a stripped down version of XFLAT with no shared library support but with a smaller footprint suitable for the kind of MCUs that NuttX originally target.

The objectives of NuttX have changed over the years.  Originally, it was intended to be a tiny RTOS with size comparable to the other tiny RTOSs like FreeRTOS and ChibiOS, but still supporting mostly POSIX OS interfaces.  So a lot of corners were cut in the original designs to keep the size to a minimum.  That objective has morphed over the years:  Now we aim to be small (but not tiny) Linux work-alike.  Very different concept.

Not to derail this issue, but what I'm actually wanting to be able to do is support loading elf files on some of these smaller chips without a mmu, but keep the memory usage down by not having to have a bunch of copy of libc included.

Some of the compiler flags to help make this possible do not seem to exist outside of ARM, but there was some recent interest in adding support for RISCV gcc.

I'll read more of the xflat docs / design.

But back to your question @a-lunev please do try the flag we suggested I'm motivated to help remove any roadblocks you run into and maybe I'll spend some time updating the docs.

patacongo commented 3 years ago

Was: Re: [apache/incubator-nuttx] eagle100:nxflat and eagle100:thttpd build errors (#3737)

Not to derail this issue, but what I'm actually wanting to be able to do is support loading elf files on some of these smaller chips without a mmu, but keep the memory usage down by not having to have a bunch of copy of libc included.

New thread created.

The way that has been done in the past was too add the libc functions to the base FLASH symbol table.  That symbol table draws the libc functions into the base FLASH image.  Then there is one copy of the libc functions in base FLASH and no libc functions in the ELF module.  The ELF module is linked to the libc functions just as they are linked to the OS inteface functions.

This will accomplish the size decrease you are looking for.

This is why the files libs/libc/libc.csv and math.csv exist:  To create C library symbol tables using apps/tools/mksymtab.sh

There was some documentation for doing this somewhere, but I can't remember where now.

a-lunev commented 3 years ago

Initially (before creating the current issue #3737) I tried to test NXFLAT mode based on lm3s6965-ek:qemu-flat config because I do not have eagle100 board physically. I was able to build NuttX w/o build errors for lm3s6965-ek:qemu-flat using gcc 7.4.0 and binutils 2.28.1 (e7659eb89e1e7c8729d4cb526117c862d9511922 of https://bitbucket.org/nuttx/buildroot.git). I've attached my custom defconfig file with enabled NXFLAT. However, when I run the resulting binary on QEMU, it produced the following output:

Registering romdisk
Mounting ROMFS filesystem at target=/mnt/romfs with source=/dev/ram0

****************************************************************************
* Executing errno
****************************************************************************

ERROR: exec(errno) failed: 2

****************************************************************************
* Executing hello
****************************************************************************

ERROR: exec(hello) failed: 2

****************************************************************************
* Executing struct
****************************************************************************

ERROR: exec(struct) failed: 2
End-of-Test.. Exit-ing

Therefore, I supposed that the execution errors may be because of the new gcc version that was mentioned broken at least since gcc 4.6.3 concerning NXFLAT support. Thus I tried gcc 4.3.3, however it produced build errors even on eagle100:nxflat and eagle100:thttpd configurations in my case (I created the current issue #3737 at that point). I tried to test eagle100:nxflat and eagle100:thttpd because "Furthermore, NXFLAT has only been tested on the Eagle-100 LMS6918 Cortex-M3 board" is written here: https://cwiki.apache.org/confluence/display/NUTTX/NxFlat

Now I've tested your suggestion concerning -mno-pic-data-is-text-relative flag using gcc 7.4.0, however I've not noticed any difference. lm3s6965-ek:qemu-flat is built w/o build errors not depending on the presence of the flag in Make.defs file of lm3s6965-ek directory. And the execution errors appear also not depending on the presence of the flag.

Steps to reproduce:

$ mkdir TEST_ROOT
$ git clone https://github.com/apache/incubator-nuttx.git TEST_ROOT/nuttx
$ git clone https://github.com/apache/incubator-nuttx-apps TEST_ROOT/apps
replace TEST_ROOT/nuttx/boards/arm/tiva/lm3s6965-ek/configs/qemu-flat/defconfig file by my attached one
$ cd TEST_ROOT/nuttx
$ ./tools/configure.sh -l lm3s6965-ek:qemu-flat

Build NXFLAT Toolchain:

$ git clone https://bitbucket.org/nuttx/buildroot.git TEST_ROOT/buildroot
$ cd TEST_ROOT/buildroot
$ cp configs/cortexm3-eabi-defconfig-7.4.0 .config
$ make oldconfig
$ make

Build NuttX:

$ cd TEST_ROOT/nuttx
$ make CROSSDEV=TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi- \
       MKNXFLAT=TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/mknxflat \
       LDNXFLAT=TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/ldnxflat

Run on QEMU:

qemu-system-arm -semihosting \
                    -M lm3s6965evb \
                    -netdev user,id=user0 \
                    -nic user,id=user0 \
                    -serial mon:stdio \
                    -kernel TEST_ROOT/nuttx/nuttx.bin

defconfig.txt

patacongo commented 3 years ago

ERROR: exec(errno) failed: 2

You can see the meaning of the error in include/errno.h:

#define ENOENT              2
#define ENOENT_STR          "No such file or directory"

Which suggests that there is something wrong with your file system or file search PATH. The ELF loader should only return the error 2 if the executable file cannot be found.

That error is printed by nxflat_main.c:

234           errmsg("ERROR: exec(%s) failed: %d\n", dirlist[i], errno);

Do you have a PATH variable set up in the environment? NO, it is not defined in your defconfig file. So the following should be using the absolute path /mnt/romfs/errno. That should work provided that /dev/ram0 is valid.

214 #ifdef CONFIG_LIB_ENVPATH
215       filename = dirlist[i];
216 #else
217       snprintf(fullpath, 128, "%s/%s", MOUNTPT, dirlist[i]);
218       filename = fullpath;
219 #endif
230       args[0] = NULL;
231       ret = exec(filename, args, g_nxflat_exports, g_nxflat_nexports);
a-lunev commented 3 years ago

Hi @patacongo,

The file system (romfs) and file search PATH are good indeed. The error code ("ERROR: exec(errno) failed: 2") is confusing and does not expose the real cause. This is not "No such file or directory" cause.

I've enabled debug logs and the real cause has been exposed:

testheader: 
****************************************************************************
* Executing errno
****************************************************************************

errno
load_absmodule: Loading /mnt/romfs/errno
nxflat_loadbinary: Loading file: /mnt/romfs/errno
nxflat_init: filename: /mnt/romfs/errno loadinfo: 0x20003d50
nxflat_read: Read 36 bytes from offset 0
nxflat_dumploadinfo: LOAD_INFO:
nxflat_dumploadinfo:   ISPACE:
nxflat_dumploadinfo:     ispace:       00000000
nxflat_dumploadinfo:     entryoffs:    000000a4
nxflat_dumploadinfo:     isize:        000001ba
nxflat_dumploadinfo:   DSPACE:
nxflat_dumploadinfo:     dspace:       00000000
nxflat_dumploadinfo:     datasize:     00000040
nxflat_dumploadinfo:     bsssize:      00000000
nxflat_dumploadinfo:       (pad):      00000000
nxflat_dumploadinfo:     stacksize:    00000800
nxflat_dumploadinfo:     dsize:        00000040
nxflat_dumploadinfo:   RELOCS:
nxflat_dumploadinfo:     relocstart:   000001fa
nxflat_dumploadinfo:     reloccount:   11
nxflat_dumploadinfo:   HANDLES:
nxflat_dumploadinfo:     filfd:        3
nxflat_load: Mapped ISpace (442 bytes) at 00019d54
nxflat_load: Allocated DSpace (108 bytes) at 0x200040c0
nxflat_read: Read 108 bytes from offset 442
nxflat_load: TEXT: 00019d54 Entry point offset: 000000a4 Data offset: 000001ba
nxflat_dumploadinfo: LOAD_INFO:
nxflat_dumploadinfo:   ISPACE:
nxflat_dumploadinfo:     ispace:       00019d54
nxflat_dumploadinfo:     entryoffs:    000000a4
nxflat_dumploadinfo:     isize:        000001ba
nxflat_dumploadinfo:   DSPACE:
nxflat_dumploadinfo:     dspace:       200040b0
nxflat_dumploadinfo:       crefs:      1
nxflat_dumploadinfo:       region:     200040c0
nxflat_dumploadinfo:     datasize:     00000040
nxflat_dumploadinfo:     bsssize:      00000000
nxflat_dumploadinfo:       (pad):      0000002c
nxflat_dumploadinfo:     stacksize:    00000800
nxflat_dumploadinfo:     dsize:        0000006c
nxflat_dumploadinfo:   RELOCS:
nxflat_dumploadinfo:     relocstart:   000001fa
nxflat_dumploadinfo:     reloccount:   11
nxflat_dumploadinfo:   HANDLES:
nxflat_dumploadinfo:     filfd:        3
nxflat_bindimports: Imports offset: 000001d2 nimports: 5
nxflat_bindimports: Import[0] (0x200040d8) offset: 00000000 func: 00000000
nxflat_bindimports: Exported symbol "__errno" not found
nxflat_loadbinary: Failed to bind symbols program binary: -2
exec_spawn: ERROR: Failed to load program 'errno': -2
nxflat_main: ERROR: exec(errno) failed: 2

As it turned out, symtab.c is generated with an empty g_nxflat_exports array. Makefile invokes $(APPDIR)/tools/mksymtab.sh script, that in turn invokes the host level "nm": nm: errno: file format not recognized

As I understand, nxflat example was broken somewhere in 2020. I was able to find that nuttx-8.2 still worked correctly concerning this particular issue (the symbol table is generated successfully, and nxflat example normally works in nuttx-8.2). However, nuttx-9.1.0 and any newer state in nuttx repo are already broken.

a-lunev commented 3 years ago

I narrowed the range of commits: nuttx ac18fc0216f81f1893b3c5349433136917e352db (15-Apr-2020) apps 404b330c25567923de8434e34dd1dbe8ccf59b8b (27-Feb-2020). Symbol table is still normally created and nxflat example works.

nuttx 9b87732b4708c44de525eefec1fd8a9bfc6c1181 (01-Jun-2020) apps deaa6c5b7bf8445b4a300691525f60aa506be0d7 (20-May-2020) nxflat example does not work

nuttx 2af72cc589aec0a01f73333496bf41a95389c2f4 (04-Jun-2020) apps 2c924f657fd17bb6a8e3b809a2b61c2539ecba52 (04-Jun-2020) nxflat example does not work

I tried about 10 more commits (pairs) in the middle of the range, however there were different build errors.

However, I see there was the main change in d03ff1bde61cb6c2f0e96a5e014077909c700d75 commit (https://github.com/apache/incubator-nuttx-apps) concerning how the symbol table for nxflat example is created. It seems the issue appeared namely in that commit.

a-lunev commented 3 years ago

Finally, I found the exact commit that broke the symbol table creation for nxflat example: f16a765ccaa9395250423c4498a9e31aac5a558d

I've tried the following correction on top of the current master branch and it fixed the issue:

diff --git a/examples/nxflat/tests/Makefile b/examples/nxflat/tests/Makefile
index 7640a5d0..28c977fa 100644
--- a/examples/nxflat/tests/Makefile
+++ b/examples/nxflat/tests/Makefile
@@ -90,7 +90,7 @@ $(DIRLIST_SRC): install
 # Create the exported symbol table list from the derived *-thunk.S files

 $(SYMTAB_SRC): install
-       $(Q) $(APPDIR)/tools/mksymtab.sh $(ROMFS_DIR) g_nxflat >$@.tmp
+       $(Q) $(APPDIR)/tools/mksymtab.sh $(TESTS_DIR) g_nxflat >$@.tmp
        $(Q) $(call TESTANDREPLACEFILE, $@.tmp, $@)
xiaoxiang781216 commented 3 years ago

@a-lunev could you provide a patch fix the typo in examples/nxflat? BTW, it would be great to remove the below lines once you fix the build break: https://github.com/apache/incubator-nuttx/blob/15b99d1f4b25a7b0f010a8f118d130428363b6c2/tools/ci/testlist/arm-13.dat#L6-L7 https://github.com/apache/incubator-nuttx/blob/15b99d1f4b25a7b0f010a8f118d130428363b6c2/tools/ci/testlist/all.dat#L2-L3 So we can catch the build issue in the furture automatically.

a-lunev commented 3 years ago

Hi @xiaoxiang781216,

So far I've provided the patch to fix the symbol table creation for nxflat example, and added configuration for lm3s6965-ek board to test nxflat on QEMU.

Concerning automatic testing, it seems NXFLAT Toolchain is absent in nuttx/tools/ci scripts. Thus first it's necessary to include building the Toolchain in the scripts and deploy.

a-lunev commented 3 years ago

Hi @btashton and @patacongo,

After the symbol table creation was fixed for nxflat example, I tested -mno-pic-data-is-text-relative flag again and now it really helped, thank you! However, there is a hard fault for the "struct" test ("errno" and "hello" tests work well):

$ qemu-system-arm -semihosting -M lm3s6965evb -netdev user,id=user0 -nic user,id=user0 -serial mon:stdio -kernel nuttx.bin
Registering romdisk
Mounting ROMFS filesystem at target=/mnt/romfs with source=/dev/ram0

****************************************************************************
* Executing errno
****************************************************************************

Wait a bit for test completion
Hello, World on stdout
Hello, World on stderr
We failed to open "aflav-sautga-ay!" errno is 2

****************************************************************************
* Executing hello
****************************************************************************

Wait a bit for test completion
Getting ready to say "Hello, world"

Hello, world!
It has been said.

argc    = 1
argv    = 0x0x20005130
argv[0] = (0x0x20005138) "<noname>"
argv[1] = 0x0
Goodbye, world!

****************************************************************************
* Executing struct
****************************************************************************

Wait a bit for test completion
Calling getstruct()
getstruct returned 0x20004db0
  n = 42 (vs 42) PASS
  pn = 0x20004da4 (vs 0x20004da4) PASS
 *pn = 87 (vs 87) PASS
  ps = 0xde5c (vs 0xde5c) PASS
  ps->n = 117 (vs 117) PASS
  pf = 0xdcec (vs 0xdcec) PASS
Calling mystruct->pf()
arm_hardfault: PANIC!!! Hard fault: 40000000
up_assert: Assertion failed at file:armv7-m/arm_hardfault.c line: 135
up_registerdump: R0: 00000017 0000df4b 00007fff 0000dcec 20004db0 0000dcec 00000000 00000000
up_registerdump: R8: 00000000 00000000 20004d60 00000000 0000a0cf 20006218 0000ddfd 0000dcec
up_registerdump: xPSR: 60000000 PRIMASK: 00000000 CONTROL: 00000000
up_registerdump: EXC_RETURN: fffffff9
up_dumpstate: sp:         20006150
up_dumpstate: stack base: 20005a58
up_dumpstate: stack size: 000007e8
up_stackdump: 20006140: 20005a58 20004e20 20002338 0000498f 00000000 00000000 00000000 0000a0cf
up_stackdump: 20006160: 20006218 0000ddfd 0000dcec 00004085 00000000 0000dcea 0000e60b 200061cc
up_stackdump: 20006180: 00000000 00000000 00000000 000036dd 0000d35c 00000e59 00000e19 20002338
up_stackdump: 200061a0: 00000003 00001137 00000004 00000c99 00000000 200061cc 0000dcec 00000000
up_stackdump: 200061c0: 00000000 00000295 20004fd8 20006218 00000000 20004db0 0000dcec 00000000
up_stackdump: 200061e0: 00000000 00000000 00000000 20004d60 00000000 fffffff9 00000017 0000df4b
up_stackdump: 20006200: 00007fff 0000dcec 0000a0cf 0000ddfd 0000dcec 60000000 0000dd21 20004e20
up_stackdump: 20006220: 0000dd21 000038dd 00000000 00001517 00000000 00000000 00000000 00000000

Do you have idea what the cause may be?

patacongo commented 3 years ago

Do you have idea what the cause may be?

No, it is impossible to interpret the hardfault dump without also having the ELF file with the addresses. Without the ELF it is just meaningless numbers. See https://cwiki.apache.org/confluence/display/NUTTX/Analyzing+Cortex-M+Hardfaults

Because printf() uses buffered I/O we don't really know where it failed. The last data buffered by printf() was lost. There was probably output after "Calling mystruct->pf()" that we do not see. It appears that the failure occurred when mystruct->pf() was called. However, it is also likely that mystruct->pf() returned and the crash occurred when the test exited. This happens often if the stack is too small. We can't tell from this data.

[Actually, the printf() buffer should have been flushed by the '\n' at the end of the output, but the data in the serial Tx buffer should still have been lost. Same result.]

You will need to analyze the hardfault dump per the steps at the above link, and/or single step through the mystruct->pf() call, and/or add a lot more printf() WITH fflush() calls.

a-lunev commented 3 years ago

Hi @patacongo,

It fails namely here (struct_main.c) on attempt to call the function: 97 mystruct->pf();

As you can see from the previous log

  pf = 0xdcec (vs 0xdcec) PASS
Calling mystruct->pf()
arm_hardfault: PANIC!!! Hard fault: 40000000

the called address is even. However, in Thumb mode it must be odd. I tried to force it to be odd (by | 1) and the hard fault was fixed.

If I understand correctly, the compiler/linker should have initialized "pf" field of "struct struct_s dummy" to an odd address automatically, however it did not.

Do you have idea what's wrong?

This is the detailed log how "struct" example was built:

...
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi-gcc -c -fpic -msingle-pic-base -mpic-register=r10 -mno-pic-data-is-text-relative -fno-builtin -Wall -Wstrict-prototypes -Wshadow -Wundef -Os -fno-strict-aliasing -fno-strength-reduce -fomit-frame-pointer -mcpu=cortex-m3 -mthumb -mfloat-abi=soft -isystem "TEST_ROOT/nuttx/include" -D__NuttX__ -D__KERNEL__  -pipe -I "TEST_ROOT/apps/include" struct_main.c -o struct_main.o
CC: struct_dummy.c
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi-gcc -c -fpic -msingle-pic-base -mpic-register=r10 -mno-pic-data-is-text-relative -fno-builtin -Wall -Wstrict-prototypes -Wshadow -Wundef -Os -fno-strict-aliasing -fno-strength-reduce -fomit-frame-pointer -mcpu=cortex-m3 -mthumb -mfloat-abi=soft -isystem "TEST_ROOT/nuttx/include" -D__NuttX__ -D__KERNEL__  -pipe -I "TEST_ROOT/apps/include" struct_dummy.c -o struct_dummy.o
LD: struct_main.o
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi-ld -r -d -warn-common -o struct.r1 struct_main.o struct_dummy.o
MK: struct.r1
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/mknxflat -o struct-thunk.S struct.r1
AS: struct-thunk.S
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi-gcc -c -fpic -msingle-pic-base -mpic-register=r10 -mno-pic-data-is-text-relative -fno-builtin -Wall -Wstrict-prototypes -Wshadow -Wundef -Os -fno-strict-aliasing -fno-strength-reduce -fomit-frame-pointer -mcpu=cortex-m3 -mthumb -mfloat-abi=soft -isystem "TEST_ROOT/nuttx/include" -D__NuttX__ -D__KERNEL__  -pipe -I "TEST_ROOT/apps/include" struct-thunk.S -o struct-thunk.o
LD: struct-thunk.o
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/arm-nuttx-eabi-ld -r -d -warn-common -T TEST_ROOT/nuttx/binfmt/libnxflat/gnu-nxflat-pcrel.ld -no-check-sections -o struct.r2 struct_main.o struct_dummy.o struct-thunk.o
LD: struct.r2
TEST_ROOT/buildroot/build_arm_nofpu/staging_dir/bin/ldnxflat -e main -s 2048 -o struct struct.r2
INPUT SECTIONS:
SECT LOW      HIGH     SIZE
TEXT 00000000 0000026a 0000026a
DATA 00000000 0000001c 0000001c
BSS  0000001c 0000001c 00000000
Entry symbol "main": 00000058 in section ".text"
...
patacongo commented 3 years ago

It fails namely here (struct_main.c) on attempt to call the function: |97 mystruct->pf();|

As you can see from the previous log

|pf = 0xdcec (vs 0xdcec) PASS Calling mystruct->pf() arm_hardfault: PANIC!!! Hard fault: 40000000 |

the called address is even. However, in Thumb mode it must be odd. I tried to force it to be odd (by | 1) and the hard fault was fixed.

If I understand correctly, the compiler should have initialized "pf" field of "struct struct_s dummy" to an odd address automatically, however it did not.

Yes, at some point bit 0 should have been set by the compiler before the call.  This used to work and I can't explain why it should be failing in this case.  It is really pretty generic C code.  The only purpose of this test is to assure that a structure is initialized properly.

It does seem like a compiler issue.  Could it believe that dummyfunc() is an ARM (vs Thumb2) function?

Sorry.  I'm no help on this one.

patacongo commented 3 years ago

On 5/23/2021 12:56 PM, Gregory Nutt wrote:

It fails namely here (struct_main.c) on attempt to call the function: |97 mystruct->pf();|

As you can see from the previous log

|pf = 0xdcec (vs 0xdcec) PASS Calling mystruct->pf() arm_hardfault: PANIC!!! Hard fault: 40000000 |

the called address is even. However, in Thumb mode it must be odd. I tried to force it to be odd (by | 1) and the hard fault was fixed.

If I understand correctly, the compiler should have initialized "pf" field of "struct struct_s dummy" to an odd address automatically, however it did not.

Yes, at some point bit 0 should have been set by the compiler before the call.  This used to work and I can't explain why it should be failing in this case.  It is really pretty generic C code.  The only purpose of this test is to assure that a structure is initialized properly.

It does seem like a compiler issue.  Could it believe that dummyfunc() is an ARM (vs Thumb2) function?

Sorry.  I'm no help on this one.

One I do in cases where I want to see what the compiler is doing is to add -save-temps to the GCC command line.  I do this:

  1. Build with V=1 so that I can see the full compile command line,
  2. Add -save-temps to the command line, and
  3. Re-compile using the modified compiler command

That will leave a .i and a .s file in addition to the .o file. The .s has the generated assembly language.  I am not sure if it will tell you anything new or now.  We already know that the value saved in the structure did not have bit 0 set.

patacongo commented 3 years ago

It does seem like a compiler issue.  Could it believe that dummyfunc() is an ARM (vs Thumb2) function?

I am not sure how the ARM-Thumb interworking is handled.  But the compiler cannot really know if a function address is an ARM or a Thumb2 address.  That cannot really be known until the files are linked via ld, right?

In this case, there is a partial link using ld to produce a struct.r2 but the final link is not performed by LD, but by ldnxflat.

a-lunev commented 3 years ago

I am not sure how the ARM-Thumb interworking is handled. But the compiler cannot really know if a function address is an ARM or a Thumb2 address. That cannot really be known until the files are linked via ld, right?

In this case, there is a partial link using ld to produce a struct.r2 but the final link is not performed by LD, but by ldnxflat.

Yes, I'm also thinking it should be done at some final phase after all address manipulations. In case of GNU Toolchain I think it's done by linker. In case of NXFLAT Toolchain I suppose the final phase might be in ldnxflat. However, I'm not sure. I've not analyzed the source code / architecture of the both Toolchains.

patacongo commented 3 years ago

In case of NXFLAT Toolchain I suppose the final phase might be in ldnxflat. However, I'm not sure. I've not analyzed the source code / architecture of the both Toolchains.

Nothing like that is done in ldnxflat. It doesn't know anything about ARM. If any address fix-ups are done, then would have to have been done when struct.r2 was linked. Might be interesting to build the nxflat_main.c and nxflat_dummy.c files using "normal" CFLAGs and see if there is a difference.

That same struct test case is used with ELF modules too. See apps/examples/elf/tests/struct. If also uses 'ld' to produce a partial link but works fine. That says that there is probably nothing wrong with the tools or the procedure. I think we are missing something else.

btashton commented 3 years ago

Looking at the relocations in r1, would we not expect to see a R_ARM_THM_GOT_BREL12 instead of R_ARM_GOT_BREL

❯ readelf -r  ../apps/examples/nxflat/tests/struct/struct.r1 

Relocation section '.rel.text' at offset 0x700 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000006  0000241e R_ARM_THM_JUMP24  00000000   printf
0000000c  0000111a R_ARM_GOT_BREL    00000000   .LC0
00000018  0000291a R_ARM_GOT_BREL    00000000   dummy

Relocation section '.rel.text.startup' at offset 0x718 contains 26 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000008  0000240a R_ARM_THM_CALL    00000000   printf
0000000c  0000270a R_ARM_THM_CALL    00000011   getstruct
0000001a  0000240a R_ARM_THM_CALL    00000000   printf
00000032  0000240a R_ARM_THM_CALL    00000000   printf
00000050  0000240a R_ARM_THM_CALL    00000000   printf
0000006e  0000240a R_ARM_THM_CALL    00000000   printf
0000008c  0000240a R_ARM_THM_CALL    00000000   printf
000000a6  0000240a R_ARM_THM_CALL    00000000   printf
000000c4  0000240a R_ARM_THM_CALL    00000000   printf
000000d4  0000240a R_ARM_THM_CALL    00000000   printf
000000e2  0000240a R_ARM_THM_CALL    00000000   printf
00000100  0000141a R_ARM_GOT_BREL    00000022   .LC3
00000104  0000151a R_ARM_GOT_BREL    00000037   .LC4
00000108  0000131a R_ARM_GOT_BREL    0000001d   .LC2
0000010c  0000161a R_ARM_GOT_BREL    0000004e   .LC5
00000110  0000251a R_ARM_GOT_BREL    00000000   dummy_scalar
00000114  0000171a R_ARM_GOT_BREL    00000063   .LC6
00000118  0000181a R_ARM_GOT_BREL    00000079   .LC7
0000011c  0000231a R_ARM_GOT_BREL    00000000   dummy_struct
00000120  0000191a R_ARM_GOT_BREL    0000008f   .LC8
00000124  00001a1a R_ARM_GOT_BREL    000000a5   .LC9
00000128  0000121a R_ARM_GOT_BREL    00000018   .LC1
0000012c  0000261a R_ARM_GOT_BREL    00000001   dummyfunc
00000130  00001b1a R_ARM_GOT_BREL    000000be   .LC10
00000134  00001c1a R_ARM_GOT_BREL    000000d4   .LC11
00000138  00001d1a R_ARM_GOT_BREL    000000ec   .LC12

Relocation section '.rel.data.rel.ro' at offset 0x7e8 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00002502 R_ARM_ABS32       00000000   dummy_scalar
00000008  00002302 R_ARM_ABS32       00000000   dummy_struct
0000000c  00002602 R_ARM_ABS32       00000001   dummyfunc
btashton commented 3 years ago

Ah I take that back, but it does look like when r2 was linked we moved to the even offset address

❯ readelf -r  ../apps/examples/nxflat/tests/struct/struct.r2

Relocation section '.rel.text' at offset 0x8f4 contains 30 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000006  0000281e R_ARM_THM_JUMP24  00000025   printf
0000000c  00000d1a R_ARM_GOT_BREL    00000174   .LC0
00000018  0000371a R_ARM_GOT_BREL    0000000c   dummy
00000030  00002218 R_ARM_GOTOFF32    00000004   __dyninfo0000
0000003c  0000280a R_ARM_THM_CALL    00000025   printf
00000040  0000330a R_ARM_THM_CALL    00000011   getstruct
0000004e  0000280a R_ARM_THM_CALL    00000025   printf
00000066  0000280a R_ARM_THM_CALL    00000025   printf
00000084  0000280a R_ARM_THM_CALL    00000025   printf
000000a2  0000280a R_ARM_THM_CALL    00000025   printf
000000c0  0000280a R_ARM_THM_CALL    00000025   printf
000000da  0000280a R_ARM_THM_CALL    00000025   printf
000000f8  0000280a R_ARM_THM_CALL    00000025   printf
00000108  0000280a R_ARM_THM_CALL    00000025   printf
00000116  0000280a R_ARM_THM_CALL    00000025   printf
00000134  0000101a R_ARM_GOT_BREL    00000196   .LC3
00000138  0000111a R_ARM_GOT_BREL    000001ab   .LC4
0000013c  00000f1a R_ARM_GOT_BREL    00000191   .LC2
00000140  0000121a R_ARM_GOT_BREL    000001c2   .LC5
00000144  0000291a R_ARM_GOT_BREL    00000000   dummy_scalar
00000148  0000131a R_ARM_GOT_BREL    000001d7   .LC6
0000014c  0000141a R_ARM_GOT_BREL    000001ed   .LC7
00000150  0000271a R_ARM_GOT_BREL    00000170   dummy_struct
00000154  0000151a R_ARM_GOT_BREL    00000203   .LC8
00000158  0000161a R_ARM_GOT_BREL    00000219   .LC9
0000015c  00000e1a R_ARM_GOT_BREL    0000018c   .LC1
00000160  0000301a R_ARM_GOT_BREL    00000001   dummyfunc
00000164  0000171a R_ARM_GOT_BREL    00000232   .LC10
00000168  0000181a R_ARM_GOT_BREL    00000248   .LC11
0000016c  0000191a R_ARM_GOT_BREL    00000260   .LC12

Relocation section '.rel.data' at offset 0x9e4 contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000102 R_ARM_ABS32       00000000   .text
00000010  00002902 R_ARM_ABS32       00000000   dummy_scalar
00000014  00002702 R_ARM_ABS32       00000170   dummy_struct
00000018  00003002 R_ARM_ABS32       00000001   dummyfunc
patacongo commented 3 years ago

00000018 00003002 R_ARM_ABS32 00000001 dummyfunc

This should be all it takes to set bit 0 in the address of dummyfunc:

case R_ARM_ABS32:
  {
    *(uint32_t *)addr += sym->st_value;
  }
  break;
btashton commented 3 years ago

There must be an issue with the sym value as we already should be handing that.

#ifdef ARCH_BIG_ENDIAN
  saved = temp = (int32_t) nxflat_swap32(*target);
#else
  saved = temp = *target;
#endif
  /* Mask  and sign extend */

  temp &= how_to->src_mask;
  temp <<= (32 - how_to->bitsize);
  temp >>= (32 - how_to->bitsize);

  /* Offset */

  temp += (sym_value + rel_section->vma) >> how_to->rightshift;

  /* Mask upper bits from rollover */

  temp &= how_to->dst_mask;

  /* Replace data that was masked */

  temp |= saved & (~how_to->dst_mask);

And from the very verbose debug output from ldnxflat

rel 3  : sym [           dummyfunc] s_addr @ 00000018 val 00000000-00000000 rel 00000000 how R_ARM_ABS32
Performing ABS32 link at addr 00000018 [00000000] to sym 'dummyfunc' [00000000]
  Original location 0xee9088 is 00000000 rsh 0  sz 2 bit 32 rel 0 smask ffffffff dmask ffffffff off 0 
  Modified location: 00000000
Sym section .text is CODE
Symbol 'dummyfunc' lies in I-Space
relocs[3]: type: 0 offset: 0000005c
btashton commented 3 years ago

Ok I was able to get this to work, but there seems to be an issue with how we identify symbols as being thumb functions in ldnxflat This fails because st_info=0x18 so it is only annotated as STT_FUNC.

if ((((elf_symbol_type *)rel_sym)->internal_elf_sym.st_info & 0x0f) == STT_ARM_TFUNC)

When I hacked these checks to just look for STT_FUNC, I was able to make the test pass. I am digging in to try and understand why the symbols do not have STT_ARM_TFUNC.

ABCDF
Registering romdisk
Mounting ROMFS filesystem at target=/mnt/romfs with source=/dev/ram0

****************************************************************************
* Executing errno
****************************************************************************

Wait a bit for test completion
Hello, World on stdout
Hello, World on stderr
We failed to open "aflav-sautga-ay!" errno is 2

****************************************************************************
* Executing hello
****************************************************************************

Wait a bit for test completion
Getting ready to say "Hello, world"

Hello, world!
It has been said.

argc    = 1
argv    = 0x0x20005130
argv[0] = (0x0x20005138) "<noname>"
argv[1] = 0x0
Goodbye, world!

****************************************************************************
* Executing struct
****************************************************************************

Wait a bit for test completion
Calling getstruct()
getstruct returned 0x20004db0
  n = 42 (vs 42) PASS
  pn = 0x20004da4 (vs 0x20004da4) PASS
 *pn = 87 (vs 87) PASS
  ps = 0xdc50 (vs 0xdc50) PASS
  ps->n = 117 (vs 117) PASS
  pf = 0xdae1 (vs 0xdae1) PASS
Calling mystruct->pf()
In dummyfunc() -- PASS
Exit-ing
End-of-Test.. Exit-ing
patacongo commented 3 years ago

Ok I was able to get this to work, but there seems to be an issue with how we identify symbols as being thumb functions in ldnxflat This fails because st_info=0x18 so it is only annotated as STT_FUNC.

Interesting. But raises more question. None of the supported ARMv7-M do now, but it is possible that they could support both ARM and Thumb instruction sets (hence the need for distinction between the two). ARMv7-A certainly does support both ISAs. If Thumb functions are labled STT_FUNC that I don't see how that could work.

None of this is unique to NxFLAT. The only place NxFLAT has an effect is in binding to base FLASH code. So why does apps/examples/elf/tests/struct not show the same problem. Something is different. I think you have identified the root cause of the problem, but this doesn't feel like the fix.

btashton commented 3 years ago

Ok I think I have tracked the issue down. Apparently we have not been using the correct way of determining if the branch type is thumb or not, and should not be relying on STT_ARM_TFUNC. There were new macros added in 2016 for accessing this information (and before that ARM_SYM_BRANCH_TYPE was the correct macro for much longer ago than that):

#define NUM_ENUM_ARM_ST_BRANCH_TYPE_BITS 2
#define ENUM_ARM_ST_BRANCH_TYPE_BITMASK \
  ((1 << NUM_ENUM_ARM_ST_BRANCH_TYPE_BITS) - 1)

#define ARM_GET_SYM_BRANCH_TYPE(STI) \
  ((enum arm_st_branch_type) ((STI) & ENUM_ARM_ST_BRANCH_TYPE_BITMASK))

      is_thumb =
        ((ARM_GET_SYM_BRANCH_TYPE (isym->st_target_internal)
          == ST_BRANCH_TO_THUMB) || type == STT_ARM_16BIT);

vs

((isym->st_info & 0x0f) == STT_ARM_TFUNC || (isym->st_info & 0x0f) == STT_ARM_16BIT)

I will clean this up and provide a patch.

btashton commented 3 years ago

@a-lunev can you try this patch? It should apply to the buildroot project without much issue either. https://github.com/btashton/nxflat/pull/2

a-lunev commented 3 years ago

Hi @btashton,

I applied the patch against e7659eb89e1e7c8729d4cb526117c862d9511922 of https://bitbucket.org/nuttx/buildroot.git and tried to build it using config/cortexm3-eabi-defconfig-7.4.0.

I added #include "config.h" line to eliminate the following error:

make[1]: Entering directory 'TEST_ROOT/buildroot/toolchain/nxflat'
gcc -c -Wall -I. -I TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1-build/bfd -I TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1/include -o ldnxflat.o ldnxflat.c
In file included from ldnxflat.c:79:
TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1-build/bfd/bfd.h:35:2: error: #error config.h must be included before this header
 #error config.h must be included before this header
  ^~~~~

Then I tested lm3s6965-ek:qemu-nxflat (from https://github.com/apache/incubator-nuttx/pull/3763). The issue with hard fault is resolved. Thank you!

Concerning NXFLAT Toolchain repo, will the new place be https://github.com/btashton/nxflat or stay https://bitbucket.org/nuttx/buildroot.git ?

btashton commented 3 years ago

Hi @btashton,

I applied the patch against e7659eb89e1e7c8729d4cb526117c862d9511922 of https://bitbucket.org/nuttx/buildroot.git and tried to build it using config/cortexm3-eabi-defconfig-7.4.0.

I added #include "config.h" line to eliminate the following error:

make[1]: Entering directory 'TEST_ROOT/buildroot/toolchain/nxflat'
gcc -c -Wall -I. -I TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1-build/bfd -I TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1/include -o ldnxflat.o ldnxflat.c
In file included from ldnxflat.c:79:
TEST_ROOT/buildroot/toolchain_build_arm_nofpu/binutils-2.28.1-build/bfd/bfd.h:35:2: error: #error config.h must be included before this header
 #error config.h must be included before this header
  ^~~~~

Yes that was patched differently in the buildroot repo. It expects that you provide the PACKAGE and PACKAGE_VERSION defines or include the config.h. those defines are normally set by whatever is consuming the headers so I provide them in the CFLAGS in the makefile.