apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.86k stars 1.17k forks source link

Kernel Build Apps #12356

Open MainframeReboot opened 5 months ago

MainframeReboot commented 5 months ago

Hi,

I have NuttX starting up in kernel build but it fails to load the init ELF file and as such, nothing really happens:

image

I have read various documentation regarding building kernal apps which leads me down the road of performing a make export in the nuttx directory followed by a make import in the apps directory. This produces an apps/bin folder but the files contained in this folder are all over 4MB in size and the resulting boot_romfsimg.h file is over 40MB in size. Does this sound right? The resulting flat build outputs a binary that is <200kb in size so I am failing to understand why kernel build generates files this large.

Is there an elegant way to do this? Any documentation that explains the proper way to build and run the NSH library in kernal mode?

acassis commented 5 months ago

Hi @MainframeReboot please enable Debug Scheduler Info and Debug FileSystem Info, it will report more info showing why it is failing. Look at sched/init/nx_bringup.c line 379, probably it is failing here:

  ret = exec_spawn(CONFIG_INIT_FILEPATH, argv, NULL,
                   CONFIG_INIT_SYMTAB, CONFIG_INIT_NEXPORTS, NULL, &attr);
MainframeReboot commented 5 months ago

Hi @MainframeReboot please enable Debug Scheduler Info and Debug FileSystem Info, it will report more info showing why it is failing. Look at sched/init/nx_bringup.c line 379, probably it is failing here:

  ret = exec_spawn(CONFIG_INIT_FILEPATH, argv, NULL,
                   CONFIG_INIT_SYMTAB, CONFIG_INIT_NEXPORTS, NULL, &attr);

Yes, this is the line it's failing at.

I understand this is due to the file system needing to be mounted but I am trying to understand if there's a way to compile the nsh library into the kernel or if I have to perform a make export and then a make import inside the ../apps folder, followed by rebuilding NuttX with the generated romfs header file using the mkromfsimg.sh tool. If so, can you comment on why the size of the generated boot_romfsimg.h file is so large? Is there a way to minimize the size of this?

acassis commented 5 months ago

I don't know if I understood correctly what you want to do and what you are doing, but let to highlight some points: in kernel build mode NuttX is very similar to Linux, it means it will need to mount the userspace to execute the "init" application.

I think you only need to use "make export" when you are building external ELFs, to be loaded later on, when the system is already running. For example to create a ELF program to be loaded from SDCard or USB Thumb Drive.

Maybe @xiaoxiang781216 @patacongo @lupyuen other could help me to understand what you want to do.

lupyuen commented 5 months ago

can you comment on why the size of the generated boot_romfsimg.h file is so large? Is there a way to minimize the size of this?

@MainframeReboot I believe NuttX PolarFire Icicle is producing Relocatable ELF Binaries that are not Fully-Linked (which contains Relocation Symbols)? There's a patch for QEMU RISC-V that produces Fully-Linked ELF Binaries, maybe this will help:

https://github.com/lupyuen/quickjs-nuttx#full-linking-for-nuttx-apps

The patch is here: https://github.com/apache/nuttx/pull/11524

patacongo commented 5 months ago

Maybe @xiaoxiang781216 @patacongo @lupyuen other could help me to understand what you want to do.

To use the kernel builds, you need to have two blobs: (1) the kernel binary, and (2) the root file system. The root file system must contain an executable called "init" that is started when by the kernel during initialization. The root file system can be provided via external media such as SD card or in-memory as a ROMFS file system.

There are two separate builds. If you build a KERNEL build configuration, you get only the kernel blob. To build the ROMFS init file system you have to invent something else. There are some simple examples in the tree. There used to be a README.txt file describing how to do the ROMFS build, but it has been removed or, perhaps, moved to another location.

Yes, using the export files can be helpful.

Here is some discussion: https://github.com/apache/nuttx/blob/nuttx-8.1/boards/arm/sama5/sama5d4-ek/README.txt#L4186

@lupyuen does this all of the time and probably has a more user friendly solution.

patacongo commented 5 months ago

... I am trying to understand if there's a way to compile the nsh library into the kernel or if I have to perform a make export and then a make import inside the ../apps folder, followed by rebuilding NuttX with the generated romfs header file using the mkromfsimg.sh tool. ...

No, that is not possible. NSH uses only user space OS interfaces. These cannot (or at least should not) be used inside of the OS.

Also, each process lives inside a protected address space. If NSH were in the kernel, it could not interact with anything in user space and vice versa.

If so, can you comment on why the size of the generated boot_romfsimg.h file is so large? Is there a way to minimize the size of this?

That is mostly because there are no shared libraries. As a consequence, a lot of C library code must be duplicated in each process. There is a lot of room for improvement for KERNEL build tools to make things cleaner. Wouldn't it be nice if there were a elf-nuttx- toolchain that could build efficient ELF modules as simply as GLIBC and GCC make build Linux processes?

You can reduce the amount of C library code in the module by reducing the size of the symbol table that draws the code into the link.

MainframeReboot commented 5 months ago

Hi All,

Thank you for your replies and the direction, I've definitely learned a lot.

I have gone down the road of exporting NuttX, importing it into the ../apps folder, making the apps and then exporting the romfs header. I then rebuild NuttX with this romfs header.

I can confirm that from the debug messages that the file system is being mounted, no issues here. However, I do seem to run into problems whenit attempts to load the program /bin/init. The issues appear to occur at up_relocateadd with the error up_relocateadd: ERROR: PCREL_HI20 at c0000022 bad:ffffffff40001000 then followed up with elf_relocateadd: ERROR: Section 2 reloc 0: Relocation failed: -22.

I'm not quite sure what this means other than my address environments are incorrectly set?

acassis commented 5 months ago

Hi @MainframeReboot I'm happy to know that you are evolving with your testings! (Suggestion: if you have a blog, create a post documenting the path you are following, it could help more people in the future, even your future you).

This error message is coming from here:

          if (!_valid_hi20_imm(imm_hi))
            {
              berr("ERROR: %s at %08" PRIxPTR " bad:%08lx\n",
                   _get_rname(relotype), addr, imm_hi << 12);

              return -EINVAL;
            }

I don't know much about RV64 arch, but I think you are putting the userspace init ELF at wrong position, maybe the function explanation rings a bell:

 * Name: _valid_hi20_imm
 *
 * Description:
 *   Check that any XX_HI20 relocation has a valid upper 20-bit immediate.
 *   Note that this test is not necessary for RV32 targets, the problem is
 *   related to RV64 sign extension.
 *
MainframeReboot commented 5 months ago

Thanks for your reply @acassis. I can definitely look into a blog post of my findings once I get this to work!

I took a look, and I am not sure where the init ELF is supposed to go. I built the apps, generated a boot_romfsimg.h header file and then recompiled NuttX using the header. I have since tried another approach to try and remove the relocation part out of the equation.

I followed the link that @lupyuen posted above and have been successful in generating fully linked applications that are significantly smaller in size.

I should mention that before I performed the full link of the apps I did some reach into the address environments and the MMU. While doing this I stumbled upon another article written by @lupyuen covering the MMU: https://lupyuen.codeberg.page/articles/mmu.html.

Using what I learned in this article, I set my address environment as follows:

Then I configured the gnu-elf.ld script to match these values before I built the apps. This enabled the init module to be loaded but ultimately resulted in a segmentation fault:

image

I am still attempting to get the default icicle knsh config to run with minor tweaks based on articles I read as well as other RISC-V configs. I have noticed that the icicle knsh config has CONFIG_ARCH_VMA_MAPPING enabled as well as addresses set for CONFIG_ARCH_SHM_VBASE and CONFIG_ARCH_KMAP_VBASE, something I haven't seen in other knsh profiles. I will look more into this next.

In the meantime, if anyone has another other suggestions I could try I would greatly appreciate it.

acassis commented 5 months ago

Nice findings @MainframeReboot ! Seems like finally the code is loading, but an exception Instruction page fault (MCAUSE = 0x0c) is happening. Maybe @lupyuen has some tips to track this issue because I saw it in his PureScript dump parser.

I noted that you are doing the board initialization in the AppBringUp thread, did you try avoid it? Disabling CONFIG_BOARD_LATE_INITIALIZE ? I don't think it will solve the root cause, but at least can have some different effect.

Regarding the blog, I suggest you write down each logical decision and testing you are doing, this will help you later when you write the post. Remember: when you arrive at your destination you won't remember all the trees you saw along the way.

patacongo commented 5 months ago

I have noticed that the icicle knsh config has CONFIG_ARCH_VMA_MAPPING enabled as well as addresses set for CONFIG_ARCH_SHM_VBASE and CONFIG_ARCH_KMAP_VBASE, something I haven't seen in other knsh profiles.

Careful. There are several knsh or kostest configurations that are used in the PROTECTED build. That PROTECTED build is for MCUs that don't have MMUs, but rather MPUs and, hence, don't support virtual addressing. Make sure that you only look at configurations that have CONFIG_BUILD_KERNEL=y

lupyuen commented 5 months ago

Hi @MainframeReboot: The RISC-V Exception looks interesting:

EXCEPTION: Instruction page fault. 
MCAUSE: 0x0C 
EPC: 0x00 
MTVAL: 0x00
Segmentation fault in PID 4: /bin/init

EPC says that the NSH Shell is trying to execute the code at Address 0 and failing? This is very odd. It's possible that the Stack Size is too small (8KB):

binfmt_dumpmodule:                                                                                                            
stacksize: 8192

Could you increase the Stack Size to 64KB? ("Default task_spawn Stack Size" and "Thread Local Storage")

https://lupyuen.github.io/articles/quickjs#nuttx-stack-is-full-of-quickjs

It's also possible that Full Linking of NSH Shell messed up the code addresses.

I wonder why NuttX doesn't show the full Crash Dump, it might show us the code that's trying to execute Address 0. Hmmm...

patacongo commented 5 months ago

EPC says that the NSH Shell is trying to execute the code at Address 0 and failing? This is very odd.

Is zero a valid address in this architecture? In ARM architectures, reset vectors are configurable but often are at virtual address zero and must never be re-mapped.

I suspect this exception occurs immediately on startup of /bin/init. Is zero the correct process startup address in the link? The startup address should be in the ELF header. Is the MMU configured properly to execute from address zero?

Sorry... more questions than answers.

MainframeReboot commented 5 months ago

EPC says that the NSH Shell is trying to execute the code at Address 0 and failing? This is very odd.

Is zero a valid address in this architecture? In ARM architectures, reset vectors are configurable but often are at virtual address zero and must never be re-mapped.

I suspect this exception occurs immediately on startup of /bin/init. Is zero the correct process startup address in the link? The startup address should be in the ELF header. Is the MMU configured properly to execute from address zero?

Sorry... more questions than answers.

Great questions, they're very helpful in pointing me in directions I wouldn't have looked otherwise.

I took a peak at the output while building the apps and noticed that the linker prints out: warning: cannot find entry symbol __start; defaulting to 0x00000000C0000000 for every single app that is built. Dumping the init ELF shows that __start is listed as *UND* and set to 0x00000000 in the symbol table. The ELF header does list the entry point address as 0xC0000000:

image

I will look into the MMU configuration next but I'm confused why it's attempting to run from address 0 if the entry point in the header is listed as 0xC0000000.

MainframeReboot commented 5 months ago

Hi @MainframeReboot: The RISC-V Exception looks interesting:

EXCEPTION: Instruction page fault. 
MCAUSE: 0x0C 
EPC: 0x00 
MTVAL: 0x00
Segmentation fault in PID 4: /bin/init

EPC says that the NSH Shell is trying to execute the code at Address 0 and failing? This is very odd. It's possible that the Stack Size is too small (8KB):

binfmt_dumpmodule:                                                                                                            
stacksize: 8192

Could you increase the Stack Size to 64KB? ("Default task_spawn Stack Size" and "Thread Local Storage")

https://lupyuen.github.io/articles/quickjs#nuttx-stack-is-full-of-quickjs

It's also possible that Full Linking of NSH Shell messed up the code addresses.

I wonder why NuttX doesn't show the full Crash Dump, it might show us the code that's trying to execute Address 0. Hmmm...

Thanks for the suggestion, I will give this a try and let you know the result. One thing I am noticing right away is that CONFIG_TLS_ALIGNED is not enabled at all in this config so CONFIG_TLS_LOG2_MAXSTACK also does not exist. I will enable this and retry before attempting the other increases.

patacongo commented 5 months ago

I will look into the MMU configuration next but I'm confused why it's attempting to run from address 0 if the entry point in the header is listed as 0xC0000000.

I don't know enough about RISC-V to be much help. For ELF modules, __start is defined in crt0.c. crt0 must be linked into all ELF modules.

src/common/crt0.c: * Name: __start
src/common/crt0.c:void __start(int argc, char *argv[])

src/common/Make.defs:STARTUP_OBJS = crt0$(OBJEXT)
src/Makefile:crt0$(OBJEXT): %$(OBJEXT): %.c

[crt0 is where shared library support would be implemented someday. C++ needs a crt1 and crtN to handle constructors and destructors.]

But, perhaps, RISC-V follows a different model??? I think you need to make sure that crt0 is being built and linked into the ELF module.

One possibility is that crt0 with __start is not being including in the ELF module link. That would be the case if nothing references it. It can be forced into the link with arguments on the link command line. In the RISC-V Makefile you will find:

./Makefile:  LDENTRY      ?= -Wl,--entry=__start
./Makefile:  LDENTRY      ?= --entry=__start

That will do the job if crt0 is built and included in the link.

MainframeReboot commented 5 months ago

Thank you all so much for your help. The recent suggestions pointed me down the right path and I can confirm that init now loads:

image

To get this to work, I took a look at the crt0.c source file mentioned by @patacongo as well as the linker scripts in the ../apps folder. I noticed that the entry point in the apps linker files does include crt0.o during linking but the entry point __start does not match the NuttX function in crt0.c. Within NuttX this function appears to have been changed to a single underscore which is why it could not be found.

I also took the advice from @lupyuen and looked at the thread local storage configuration. As mentioned in my previous reply this wasn't set at all, so I added CONFIG_TLS_ALIGNED=y as well as CONFIG_TLS_LOG2_MAXSTACK=16 into my .config file. I also increased the stack size to 64kb by setting CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=65536 as per his instructions

@acassis, As I don't yet have a blog, I will document the other changes I had to make as the knsh defconfig for the mpfs icicle kit required additional modifications in order to work (not in chronological order):

First, we can optionally set a name for our image, otherwise one will be created dynamically

set-name: 'PolarFire-SoC-HSS::nuttxbsp_kernel_payload'

Next, we'll define the entry point addresses for each hart, as follows:

hart-entry-points: {u54_1: '0x80000000', u54_2: '0x80000000', u54_3: '0x80000000', u54_4: '0x80000000> # payloads: nuttxbsp.bin: {exec-addr: '0x80000000', owner-hart: u54_1, priv-mode: prv_s, skip-opensbi: true}


After all of that, NuttX kernel build runs on the PolarFire Icicle kit. Although I am running into an issue getting `helloxx` to run. The app `hello` works but I get a segmentation fault when running `helloxx`:

![image](https://github.com/apache/nuttx/assets/168458700/69a00b42-c1d6-438e-9468-c062cc8c6a8a)

I noticed `crt0.c` has preprocessor definitions for CXX so I will look into this as well other C++ related configuration options as I might be missing something on that front.

Thank you again @patacongo, @lupyuen, @acassis and @pussuw for your support on getting this to work.
pussuw commented 5 months ago

A good candidate for that page fault would be C++ ctors/dtors trying to execute with kernel privileges. The same crt0 file should handle those per process, however I have seen places where the ctor/dtor code is executed in binfmt (there is a kconfig that controls this I believe, don't remember the name though). You should check for that.

RISC-V quite smartly prohibits executing user mapped code segments with raised privileges. This is done unconditionally and it cannot be bypassed (not by design, nor by accident).

patacongo commented 5 months ago

I have seen places where the ctor/dtor code is executed in binfmt

Constructors and destructors should run in crti and crtN, respectively, for general Unix compatibility. crti should support init and crtN should supporrt fini.

Lots more detail if you care about this: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html

As the OP mentioned, this is currently done with conditional logic in crt0. That is technically okay but will likely confuse people who are used to thinking about things in the GCC/GLIBC way.

There is a long-open issue #1265 (and related #1263). Perhaps this is only a problem in PROTECTED mode which as a single set of ctors and dtors for the whole user-space blob. KERNEL mode uses only loadable ELF modules, each with their own ctors and dtors.

crt0.c exists only for armv7-a, arm64, and riscv-5 all of which can support the KERNEL build.

So crt0 with C++ ELF modules should never be used in PROTECTED mode right now or else you would get doubly constructed static classes.

I have seen places where the ctor/dtor code is executed in binfmt (there is a kconfig that controls this I believe, don't remember the name though)

That is CONFIG_BINFMT_CONSTRUCTORS. I think that should be disabled in KERNEL mode to let the ELF module call its own constructors and destructors in the correct context.

This seems awkward and prone to errors to me.

acassis commented 5 months ago

Congratulations @MainframeReboot !!! I'm glad to know you got it working.

Since you don't have a blog, I have a better suggestion:

Please submit it a guide "Running NuttX in kernel mode (with MMU support) on Microchip PolarFire Icicle board". So it could be included here: https://nuttx.apache.org/docs/latest/guides/index.html

Just create it inside nuttx/Documentation/guides/ so it will be available as reference for everybody using this board.

MainframeReboot commented 5 months ago

I have seen places where the ctor/dtor code is executed in binfmt

Constructors and destructors should run in crti and crtN, respectively, for general Unix compatibility. crti should support init and crtN should supporrt fini.

Lots more detail if you care about this: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html

As the OP mentioned, this is currently done with conditional logic in crt0. That is technically okay but will likely confuse people who are used to thinking about things in the GCC/GLIBC way.

There is a long-open issue #1265 (and related #1263). Perhaps this is only a problem in PROTECTED mode which as a single set of ctors and dtors for the whole user-space blob. KERNEL mode uses only loadable ELF modules, each with their own ctors and dtors.

crt0.c exists only for armv7-a, arm64, and riscv-5 all of which can support the KERNEL build.

So crt0 with C++ ELF modules should never be used in PROTECTED mode right now or else you would get doubly constructed static classes.

I have seen places where the ctor/dtor code is executed in binfmt (there is a kconfig that controls this I believe, don't remember the name though)

That is CONFIG_BINFMT_CONSTRUCTORS. I think that should be disabled in KERNEL mode to let the ELF module call its own constructors and destructors in the correct context.

This seems awkward and prone to errors to me.

I naively enabled CONFIG_BINFMT_CONSTRUCTORS as the prompt told me that it enables "C++ Static Constructor Support" so it seemed like something I should enable. I have turned this off.

However, that alone did not solve the issue. I had to remove the calls exec_ctores() and atexit(exec_dtors) from crt0.c in order to not crash on ELF load. After doing this, helloxx starts up without issue although there still appears to be some issues with static constructors despite CONFIG_HAVE_CXXINITIALIZE being enabled. I am assuming this goes back crt0 not being used with C++ ELF modules. Perhaps I need the ctor/dtors calls inside my ELF files instead, I can try something like this.

Regardless, I can live with NuttX kernel build that has no C++ static constructor support given everything else appears to be working flawlessly.

patacongo commented 5 months ago

However, that alone did not solve the issue. I had to remove the calls exec_ctores() and atexit(exec_dtors) from crt0.c in order to not crash on ELF load.

exec_ctors() is really simple, but it depends on a table of constructor addresses created by the build logic. These are defined by binfmt/libelf/gnu-elf.ld or similar:

/* Linker defined symbols to .ctors and .dtors */

extern initializer_t _sctors[];
extern initializer_t _ectors[];
extern initializer_t _sdtors[];
extern initializer_t _edtors[];

Check whatever linker script you use to build the elf modules. Check that those symbols exist in the module.

These define a table of constructor and destructor address. Constructors start at _sctors and end at _ectors. Make sure that there are constructors in the table.

Issues with atexit() may be like those of #1263. I don't know the state of that issue. The function called by atexit() calls the destructors. So more likely that is the same issue as with the list of destructors.

Regardless, I can live with NuttX kernel build that has no C++ static constructor support given everything else appears to be working flawlessly.

But you shouldn't have to!

pussuw commented 5 months ago

I fixed #1263 a while back (years back), so that should not be an issue here. However CONFIG_BINFMT_CONSTRUCTORS can cause such issues.

I use C++ almost exclusively with BUILD_KERNEL, it works just fine. There is something odd about your build environment still, hope you can find it!

MainframeReboot commented 5 months ago

However, that alone did not solve the issue. I had to remove the calls exec_ctores() and atexit(exec_dtors) from crt0.c in order to not crash on ELF load.

exec_ctors() is really simple, but it depends on a table of constructor addresses created by the build logic. These are defined by binfmt/libelf/gnu-elf.ld or similar:

/* Linker defined symbols to .ctors and .dtors */

extern initializer_t _sctors[];
extern initializer_t _ectors[];
extern initializer_t _sdtors[];
extern initializer_t _edtors[];

Check whatever linker script you use to build the elf modules. Check that those symbols exist in the module.

These define a table of constructor and destructor address. Constructors start at _sctors and end at _ectors. Make sure that there are constructors in the table.

Issues with atexit() may be like those of #1263. I don't know the state of that issue. The function called by atexit() calls the destructors. So more likely that is the same issue as with the list of destructors.

Regardless, I can live with NuttX kernel build that has no C++ static constructor support given everything else appears to be working flawlessly.

But you shouldn't have to!

I decided not to settle and poked around more to see if I can get static constructor support to work but no luck so far.

I have taken a look and CONFIG_BINFMT_CONSTRUCTORS is not set in my config anymore.

The segmentation fault appears to happen inside the function exec_ctors() the moment (*ctor)() is executed. This is called inside __start() that is provided by crt0.o.

I dumped helloxx and examined the symbol table to make sure .ctors was there and it is:

image

My linker script is based on the linker script from the Fully Linking for NuttX apps article by @lupyuen and looks as follows:

SECTIONS
{
  . = 0xC0000000;
  .text :
    {
      _stext = . ;
      *(.text)
      *(.text.*)
      *(.gnu.warning)
      *(.stub)
      *(.glue_7)
      *(.glue_7t)
      *(.jcr)

      /* C++ support:  The .init and .fini sections contain specific logic
       * to manage static constructors and destructors.
       */

      *(.gnu.linkonce.t.*)
      KEEP(*(.init))             /* Old ABI */
      KEEP(*(.fini))             /* Old ABI */
      _etext = . ;
    }

  .rodata :
    {
      _srodata = . ;
      *(.rodata)
      *(.rodata1)
      *(.rodata.*)
      *(.gnu.linkonce.r*)
      _erodata = . ;
    }

  . = 0xC0400000;
  .data :
    {
      _sdata = . ;
      *(.data)
      *(.data1)
      *(.data.*)
      *(.gnu.linkonce.d*)
      . = ALIGN(4);
      _edata = . ;
    }

  /* C++ support. For each global and static local C++ object,
   * GCC creates a small subroutine to construct the object. Pointers
   * to these routines (not the routines themselves) are stored as
   * simple, linear arrays in the .ctors section of the object file.
   * Similarly, pointers to global/static destructor routines are
   * stored in .dtors.
   */

  .ctors :
    {
      _sctors = . ;
      KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
      KEEP(*(.init_array .ctors))
      _ectors = . ;
    }

  .dtors :
    {
      _sdtors = . ;
      KEEP (*(.dtors))       /* Old ABI:  Unallocated */
      KEEP (*(.fini_array))  /* New ABI:  Allocated */
      KEEP (*(SORT(.fini_array.*)))
      _edtors = . ;
    }

  .bss :
    {
      _sbss = . ;
      *(.bss)
      *(.bss.*)
      *(.sbss)
      *(.sbss.*)
      *(.gnu.linkonce.b*)
      *(COMMON)
      _ebss = . ;
    }

  /* Thread local storage support */
  .tdata : {
      _stdata = ABSOLUTE(.);
      KEEP (*(.tdata .tdata.* .gnu.linkonce.td.*));
      _etdata = ABSOLUTE(.);
  }

  .tbss : {
      _stbss = ABSOLUTE(.);
      KEEP (*(.tbss .tbss.* .gnu.linkonce.tb.* .tcommon));
      _etbss = ABSOLUTE(.);
  }

    /* Stabs debugging sections.    */

    .stab 0 : { *(.stab) }
    .stabstr 0 : { *(.stabstr) }
    .stab.excl 0 : { *(.stab.excl) }
    .stab.exclstr 0 : { *(.stab.exclstr) }
    .stab.index 0 : { *(.stab.index) }
    .stab.indexstr 0 : { *(.stab.indexstr) }
    .comment 0 : { *(.comment) }
    .debug_abbrev 0 : { *(.debug_abbrev) }
    .debug_info 0 : { *(.debug_info) }
    .debug_line 0 : { *(.debug_line) }
    .debug_pubnames 0 : { *(.debug_pubnames) }
    .debug_aranges 0 : { *(.debug_aranges) }
}

I can confirm the initial address ctor points to is 0xC0400020 which matches the symbol table so I'm not quite sure what I'm missing now...

MainframeReboot commented 5 months ago

@patacongo

I sat down and RTFM to fully understand the differences between crt0, crt1, crti, crtbegin, crtn and crtend. This has helped me immensely and I have managed to get static global instances to run on NuttX kernel by making the necessary changes to NuttX.

I will explain my patches below but the executive summary is use crt0 for C, use crt1 for C++ while also manually linking crti, crtbegin, crtn and crtend for C++ applications only.

crt0.c

The first modification was to remove C++ logic from crt0. This means that all of the .ctors/.dtors arrays at the top (_sctors, sdtors etc.) are removed, along with the the calls to exec_ctors and atexit(exec_dtors). The result is a crt0.c file that looks like this (some comments stripped for size):

/****************************************************************************
 * arch/risc-v/src/common/crt0.c
 ****************************************************************************/
#include <nuttx/arch.h>
#include <nuttx/addrenv.h>
#include <nuttx/compiler.h>
#include <nuttx/config.h>

#include <sys/types.h>
#include <syscall.h>
#include <stdlib.h>
#include <stdio.h>

#include "riscv_internal.h"

#ifdef CONFIG_BUILD_KERNEL

int main(int argc, char *argv[]);

static void sig_trampoline(void) naked_function;
static void sig_trampoline(void)
{
  __asm__ __volatile__
  (
    " addi sp, sp, -" STACK_FRAME_SIZE "\n"   /* Save ra on the stack */
    REGSTORE " ra, 0(sp)\n"
    " mv   t0, a0\n"        /* t0=sighand */
    " mv   a0, a1\n"        /* a0=signo */
    " mv   a1, a2\n"        /* a1=info */
    " mv   a2, a3\n"        /* a2=ucontext */
    " jalr t0\n"            /* Call the signal handler (modifies ra) */
    REGLOAD " ra, 0(sp)\n"  /* Recover ra in sp */
    " addi sp, sp, " STACK_FRAME_SIZE "\n"
    " li   a0, %0\n"        /* SYS_signal_handler_return */
    " ecall\n"              /* Return from the SYSCALL */
    " nop\n"
    :
    : "i" (SYS_signal_handler_return)
    :
  );
}

void __start(int argc, char *argv[])
{
  int ret;

  /* Initialize the reserved area at the beginning of the .bss/.data region
   * that is visible to the RTOS.
   */

  ARCH_DATA_RESERVE->ar_sigtramp = (addrenv_sigtramp_t)sig_trampoline;

  /******************************************************************
  *  Do NOT include C++ constructor/destructor calls in this file.
  *  This file is for C applications only. Refer to crt1.c for C++.
  *******************************************************************/

  /* Call the main() entry point passing argc and argv. */
  ret = main(argc, argv);

  /* Call exit() if/when the main() returns */

  exit(ret);
}

#endif /* CONFIG_BUILD_KERNEL */

crt1.c

All C++ constructor/destructor logic was moved to the new file crt1.c. Note that this is dfference from the original C++ logic in crt0.c. I have learned, during my research, that .ctors and .dtors is the "old" way of doing things and that the new recommended way of doing things is to use .preinit_array, .init_array and .fini_array. The RISC-V toolchain that I use (xPack 13.2.0) supports the new way so I have decided to follow the new convention. From my reading of old RISC-V NuttX issues, it appears the NuttX team has moved on to xPack 13.2.0 for testing so it should work with other RISC-V boards as well. Here is the new crt1.c file:

/****************************************************************************
 * arch/risc-v/src/common/crt1.c
 ****************************************************************************/

#include <nuttx/arch.h>
#include <nuttx/addrenv.h>
#include <nuttx/compiler.h>
#include <nuttx/config.h>

#include <sys/types.h>
#include <syscall.h>
#include <stdlib.h>

#include "riscv_internal.h"

#ifdef CONFIG_BUILD_KERNEL

int main(int argc, char *argv[]);

static void sig_trampoline(void) naked_function;
static void sig_trampoline(void)
{
  __asm__ __volatile__
  (
    " addi sp, sp, -" STACK_FRAME_SIZE "\n"   /* Save ra on the stack */
    REGSTORE " ra, 0(sp)\n"
    " mv   t0, a0\n"        /* t0=sighand */
    " mv   a0, a1\n"        /* a0=signo */
    " mv   a1, a2\n"        /* a1=info */
    " mv   a2, a3\n"        /* a2=ucontext */
    " jalr t0\n"            /* Call the signal handler (modifies ra) */
    REGLOAD " ra, 0(sp)\n"  /* Recover ra in sp */
    " addi sp, sp, " STACK_FRAME_SIZE "\n"
    " li   a0, %0\n"        /* SYS_signal_handler_return */
    " ecall\n"              /* Return from the SYSCALL */
    " nop\n"
    :
    : "i" (SYS_signal_handler_return)
    :
  );
}

/****************************************************************************
 * Public Data
 ****************************************************************************/

/*
    Linker defined symbols to .preinit_array, .init_array and .fini_array.

    .ctors and .dtors are not used by RISC-V.
 */
extern initializer_t __preinit_array_start[];
extern initializer_t __preinit_array_end[];
extern initializer_t __init_array_start[];
extern initializer_t __init_array_end[];
extern initializer_t __fini_array_start[];
extern initializer_t __fini_array_end[];

/****************************************************************************
 * Private Functions
 ****************************************************************************/

#ifdef CONFIG_HAVE_CXX

/****************************************************************************
 * Name: exec_preinit
 *
 * Description:
 *   Calls startup functions prior to main entry point
 *
 ****************************************************************************/
static void exec_preinit(void)
{
  initializer_t *preinit;

  for(preinit = __preinit_array_start; preinit < __preinit_array_end; ++preinit)
  {
    initializer_t initializer = *preinit;

    if (initializer)
    {
      initializer();
    }
  }
}

/****************************************************************************
 * Name: exec_init
 *
 * Description:
 *   Calls static constructors prior to main entry point
 *
 ****************************************************************************/
static void exec_init(void)
{
  initializer_t *init;

  for(init = __init_array_start; init < __init_array_end; ++init)
  {
    initializer_t initializer = *init;

    if (initializer)
    {
      initializer();
    }
  }
}

/****************************************************************************
 * Name: exec_fini
 *
 * Description:
 *   Calls static destructors using atexit
 *
 ****************************************************************************/
static void exec_fini(void)
{
  initializer_t *fini;

  for(fini = __fini_array_start; fini < __fini_array_end; ++fini)
  {
    initializer_t initializer = *fini;

    if (initializer)
    {
      initializer();
    }
  }
}
#endif

void __start(int argc, char *argv[])
{
  int ret;

  /* Initialize the reserved area at the beginning of the .bss/.data region
   * that is visible to the RTOS.
   */

  ARCH_DATA_RESERVE->ar_sigtramp = (addrenv_sigtramp_t)sig_trampoline;

#ifdef CONFIG_HAVE_CXX
/* Call preinit functions */
  exec_preinit();

  /* Call C++ constructors */
  exec_init();

  /* Setup so that C++ destructors called on task exit */
  atexit(exec_fini);
#endif

  /* Call the main() entry point passing argc and argv. */

  ret = main(argc, argv);

  /* Call exit() if/when the main() returns */

  exit(ret);
}

#endif /* CONFIG_BUILD_KERNEL */

gnu-elf.ld

In order to make sure the arrays in the crt1.c file are populated, the linker script was updated to match:

/****************************************************************************
 * boards/risc-v/mpfs/icicle/scripts/gnu-elf.ld
 ****************************************************************************/

SECTIONS
{
  .text 0xC0000000 :
    {
      _stext = . ;
      *(.text)
      *(.text.*)
      *(.gnu.warning)
      *(.stub)
      *(.glue_7)
      *(.glue_7t)
      *(.jcr)

      /* C++ support:  The .init and .fini sections contain specific logic
       * to manage static constructors and destructors.
       */

      *(.gnu.linkonce.t.*)
      *(.init)      /* Old ABI */
      *(.fini)      /* Old ABI */
      _etext = . ;
    }

  .rodata :
    {
      _srodata = . ;
      *(.rodata)
      *(.rodata1)
      *(.rodata.*)
      *(.gnu.linkonce.r*)
      _erodata = . ;
    }

  .data 0xC0101000:
    {
      _sdata = . ;
      *(.data)
      *(.data1)
      *(.data.*)
      *(.gnu.linkonce.d*)
      . = ALIGN(4);
      _edata = . ;
    }

  /* C++ support. For each global and static local C++ object,
   * GCC creates a small subroutine to construct the object. Pointers
   * to these routines (not the routines themselves) are stored as
   * simple, linear arrays in the .ctors section of the object file.
   * Similarly, pointers to global/static destructor routines are
   * stored in .dtors.
   */
  .preinit_array :
    {
      PROVIDE(__preinit_array_start = .);
      KEEP(*(.preinit_array*))
      PROVIDE(__preinit_array_end = .);
    }

  .init_array :
   {
    PROVIDE(__init_array_start = .);
    KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP(*(.init_array .ctors))
    PROVIDE(__init_array_end = .);
   }

  .fini_array :
   {
    PROVIDE(__fini_array_start = .);
    KEEP(*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP(*(.fini_array .dtors))
    PROVIDE(__fini_array_end = .);
   }

  /* Thread local storage support */
  .tdata :
    {
      _stdata = ABSOLUTE(.);
      KEEP (*(.tdata .tdata.* .gnu.linkonce.td.*));
      _etdata = ABSOLUTE(.);
    }

  .tbss :
    {
      _stbss = ABSOLUTE(.);
      KEEP (*(.tbss .tbss.* .gnu.linkonce.tb.* .tcommon));
      _etbss = ABSOLUTE(.);
    }

  .bss :
    {
      _sbss = . ;
      *(.bss)
      *(.bss.*)
      *(.sbss)
      *(.sbss.*)
      *(.gnu.linkonce.b*)
      *(COMMON)
      _ebss = . ;
    }

    /* Stabs debugging sections.    */

    .stab 0 : { *(.stab) }
    .stabstr 0 : { *(.stabstr) }
    .stab.excl 0 : { *(.stab.excl) }
    .stab.exclstr 0 : { *(.stab.exclstr) }
    .stab.index 0 : { *(.stab.index) }
    .stab.indexstr 0 : { *(.stab.indexstr) }
    .comment 0 : { *(.comment) }
    .debug_abbrev 0 : { *(.debug_abbrev) }
    .debug_info 0 : { *(.debug_info) }
    .debug_line 0 : { *(.debug_line) }
    .debug_pubnames 0 : { *(.debug_pubnames) }
    .debug_aranges 0 : { *(.debug_aranges) }
}

Compilation & Linking

I had to modify how my kernel applications were compiled and linked in order to support the new crt0.c and crt1.c files. To do this, I modified the toolchain.cmake.export file under nuttx/tools. Below is a snippet of the modification:

file(GLOB CSTARTUP_OBJS ${NUTTX_PATH}/startup/*)
file(GLOB CXXSTARTUP_OBJS ${NUTTX_PATH}/startup/*)

add_compile_options($<$COMPILE_LANGUAGE:C>:-nostdlib>)
add_compile_options($<$COMPILE_LANGUAGE:CXX>:-nodefaultlibs$<SEMICONON>-nostartfiles>)

set(CMAKE_C_LINK_EXECUTABLE
    "<CMAKE_LINKER> ${LDFLAGS} --entry=__start -T${LINKER_SCRIPT} <OBJECTS> ${CSTARTUP_OBJS} -o <TARGET> <LINK_LIBRARIES> -L${NUTTX_PATH}/libs --start-group ${LDLIBS} ${EXTRA_LIBS} --end-group"
)
set(CMAKE_CXX_LINK_EXECUTABLE
    "<CMAKE_LINKER> ${LDFLAGS} --entry=__start -T${LINKER_SCRIPT} <OBJECTS> ${CXXSTARTUP_OBJS} -o <TARGET> <LINK_LIBRARIES> -L${NUTTX_PATH}/libs --start-group ${LDLIBS} ${EXTRA_LIBS} --end-group"
)

The above modifications ensure that C++ is not linked with any standard libraries nor is it linked with any toolchain start files (I do this manually later on) while C is. It also splits the STARTUP_OBJS variable into a CSTARTUP_OBJS and a CXXSTARTUP_OBJS variable so I can set one to crt0.o and the other to crt1.o. The modification to STARTUP_OBJS is done in arch/risc-v/src/common/Make.defs.

Inside the Makefile found within arch/risc-v/src, I added a section to ensure crt1 was compiled, right under crt0:

crt0$(OBJEXT): %$(OBJEXT): %.c
    $(call COMPILE, $<, $@)

crt1$(OBJEXT): %$(OBJEXT): %.c
    $(call COMPILE, $<, $@)

Additionally, I had to make sure crt1.o was exported when make export is run so that it appears under apps/import/startup. Inside the same Makefile found within arch/risc-v/src, I modified export_startup to contain CXXSTARTUP_OBJS:

export_startup: $(CSTARTUP_OBJS) $(CXXSTARTUP_OBJS)
ifneq ($(CSTARTUP_OBJS),)
    $(Q) if [ -d "$(EXPORT_DIR)/startup" ]; then \
        cp -f $(CSTARTUP_OBJS) "$(EXPORT_DIR)/startup/."; \
     else \
        echo "$(EXPORT_DIR)/startup does not exist"; \
    exit 1; \
    fi
endif
ifneq ($(CXXSTARTUP_OBJS),)
    $(Q) if [ -d "$(EXPORT_DIR)/startup" ]; then \
        cp -f $(CXXSTARTUP_OBJS) "$(EXPORT_DIR)/startup/."; \
     else \
        echo "$(EXPORT_DIR)/startup does not exist"; \
    exit 1; \
    fi
endif

Now that my crt0.o and crt1.o files are built and exported, I had to make sure they were used while building my apps. To do this, I modified the Make.defs file found inside apps/import/Make.defs to remove default libs and start up files from only the C++ apps:

ARCHCFLAGS += -fno-common -pipe
ARCHCXXFLAGS += -fno-common -nostdinc++ -pipe -nodefaultlibs -nostartfiles

As I am manually linking the C runtime files for C++, I also added some variables to the top of the file for these objects:

ARCHCRT0OBJ = $(call CONVERT_PATH,$(TOPDIR)$(DELIM)startup$(DELIM)crt0$(OBJEXT))
ARCHCRT1OBJ = $(call CONVERT_PATH,$(TOPDIR)$(DELIM)startup$(DELIM)crt1$(OBJEXT))
ARCHCRTIOBJ = $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crti.o))
ARCHCRTBEGINOBJ = $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtbegin.o))
ARCHCRTENDOBJ = $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtend.o))
ARCHCRTNOBJ = $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtn.o))

The last modification involves the Application.mk file under /apps. Here, I updated the ELFLD function to perform a check of the incoming object and then call another function to perform the linking depending on whether it's a pure C application or a C++ application:

define ELFLDC
    $(info From ELFLDC: $1 $2)
    $(Q) $(LD) $(LDELFFLAGS) $(LDLIBPATH) $(ARCHCRT0OBJ) $1 $(LDSTARTGROUP) $(LDLIBS) $(LDENDGROUP) -o $2
endef

define ELFLDCXX
    $(info From ELFLDCXX: $1 $2)
    $(Q) $(LD) $(LDELFFLAGS) $(LDLIBPATH) $(ARCHCRT1OBJ) $(ARCHCRTIOBJ) $(ARCHCRTBEGINOBJ) $1 $(LDSTARTGROUP) $(LDLIBS) $(LDENDGROUP) $(ARCHCRTENDOBJ) $(ARCHCRTNOBJ) -o $2
endef

define ELFLD
    $(ECHO_BEGIN)"LD: $2 "
    $(if $(MAINCOBJ), $(call ELFLDC, $1, $2), $(call ELFLDCXX, $1, $2))
    $(ECHO_END)
endef

Now when I perform a make import, the C applications are linked to crt0.o while the C++ applications are linked to crt1.o as well as to crti.o, crtbegin.o, crtn.o and crtend.o. The result is C++ applications now work with static global instances:

image

The additional benefit is that C applications no longer have empty .preinit_array/.init_array/.fini_array sections and no longer call exec_preinit, exec_init and atexit(exec_fini) as there is no reason for them to, they will never have any constructors/destructors in them.

I apologize for this insanely long post, but I wanted to document this in the event anyone else wants to use NuttX kernel build with C++ apps and static global instances. I also apologize if my "hacks" are unsightly, I am not an expert at makefiles so I am sure I broke a number of best practices while coming up with this solution. I am hoping now that I have a solution it can be critiqued, and someone can let me know if this is an OK way of doing it. Nonetheless, I have learned a ton getting the NuttX kernel running on the Polarfire Icicle board and I couldn't have done it without your support, so thank you again!

acassis commented 5 months ago

@patacongo @MainframeReboot should this arch/risc-v/src/common/crt1.c be submitted to inclusion into nuttx mainline?

patacongo commented 5 months ago

@patacongo @MainframeReboot should this arch/risc-v/src/common/crt1.c be submitted to inclusion into nuttx mainline?

I would think so. On one hand it should be functionally equivalent, but this is the cannonically correct way to organize the logic and will also enable us to support the dynamic loader, ldso, and shared libraries in the future. @xiaoxiang781216 what do you think?

patacongo commented 5 months ago

I sat down and RTFM to fully understand the differences between crt0, crt1, crti, crtbegin, crtn and crtend

I recall that crtbegin and crtend come from GCC, deriving from a file called crtstuff. I think these are essentially "libraries". I think they contain the configurable, Compiler specific parts for crt1 and crtn. crt1 and crtn are then provided by GLIBC for most architectures.

I looked into GLIBC and it does follow this clean breakdown for most architectures, but not all of them. I think we should not get too tied into either GCC or GLIBC

MainframeReboot commented 5 months ago

I sat down and RTFM to fully understand the differences between crt0, crt1, crti, crtbegin, crtn and crtend

I recall that crtbegin and crtend come from GCC, deriving from a file called crtstuff. I think these are essentially "libraries". I think they contain the configurable, Compiler specific parts for crt1 and crtn. crt1 and crtn are then provided by GLIBC for most architectures.

I looked into GLIBC and it does follow this clean breakdown for most architectures, but not all of them. I think we should not get too tied into either GCC or GLIBC

Fair enough.

I do want to reiterate that the switch from all logic in crt0.c to a crt0.c and crt1.c approach didn't solve my issue alone. I had to modify how the apps were compiled and linked or else it either broke C applications, C++ applications or sometimes both. With that said, I understand this might only be relevant to my architecture and toolchain combination so pushing the modifications out to common makefiles is most likely not desirable. I'm interested to hear ideas on how this can be properly implemented to support all architectures.