EtchedPixels / FUZIX

FuzixOS: Because Small Is Beautiful
Other
2.15k stars 270 forks source link

armm4: fix a regression causing inability to initialize userspace on dk-tm4c129x #989

Closed pawosm-arm closed 1 year ago

pawosm-arm commented 1 year ago

Sadly, the ROHS solder gave way eventually and my bypass connection broke causing a damage beyond repair. But I've managed to squeeze my budget so tightly that I could buy a new DK board. And with it, I could prepare a patch which makes it boot again.

EtchedPixels commented 1 year ago

Well that's covering up the real bug by setting the offset to 0 not sure it's the right solution. I will take a look when I have more time. At least this confirms the bug is that some tool is mishandling the offset

pawosm-arm commented 1 year ago

Well that's covering up the real bug by setting the offset to 0 not sure it's the right solution. I will take a look when I have more time. At least this confirms the bug is that some tool is mishandling the offset

IMO maintaining offset on the platforms that don't install VDSO is somehow redundant. I don't think the ultimate solution that would fit to all of the 32-bit platforms exists, for me, ifdef is a well enough solution even if not perfect.

pawosm-arm commented 1 year ago

I think we've reached some stalemate. Considering @davidgiven has contributed an armm0 platform, maybe he could suggest something?

davidgiven commented 1 year ago

I'm a bit out of touch --- the knowledge that Fuzix uses a VDSO at all is new to me! What's it for? It seems rather overkill for a system like Fuzix.

pawosm-arm commented 1 year ago

Dave, it seems some 32-bit architectures install them (e.g. Atari ST), some of them not (including dk-tm4c129x). My patch assumes, if a platform does not export install_vdso() then we don't bother. But this may look like sweeping problems under the carpet. Both armm0 and armm4 platforms share the same linker script (Library/elfexe32.ld) that you've kindly contributed. Sadly, I'm not familiar with the syntax of those scripts, so I can't really tell whether they need to be modified if the assumption is that VDSO's can potentially be there.

davidgiven commented 1 year ago

Well, my intention with the ARM linker scripts is to produce as simple binaries as possible, with an absolute minimum of ELF stuff. So if they're trying to install a VDSO then something's probably wrong with the configuration. I would probably want to find out why the platforms that do use a VDSO are doing so, and potentially make them stop if they seem to be doing it accidentally.

That said, I last touched this a while ago, so it may be that there is a good reason --- you'd have to check.

EtchedPixels commented 1 year ago

Not really a VDSO. The first few words of the a.out binary after the header are used to pass additional info not in the a.out header (like stack size), the loader puts whatever the arch wants in that space to replace those info words. Some use it for syscalls (eg 68K where you can't be sure some ROM hasn't nabbed half the trap calls) or for signal fixups.

The ARM setup for some reason seems to be ending up with a binary linked at 0 not offset and this messes the relocations up.

EtchedPixels commented 1 year ago

I finally found the old ek version fo the board I have and plugged it in. Loading code with openocd/gdb seems to work and I can then ^C it and it's wedged in eth_ifup

(gdb) where

0 eth_ifup () at eth.c:902

1 0x0000145e in ethdev_init () at eth.c:1163

2 0x00000786 in device_init () at devices.c:50

3 0x00003444 in fuzix_main () at start.c:401

4 0x0000022c in start () at crt0.c:24

I need to work out where the serial output comes out as well anfd find a 3v3 adapter

EtchedPixels commented 1 year ago

Played with this a bit more, console works - the dk SD has \CS on a pin the ek doesn't seem to break out so it fails to find the SD card before bombing on the ethernet. Need to work out how to plumb a second \CS line and/or work out how to tell ek and dk apart

pawosm-arm commented 1 year ago

I don't know about EK, seems like simpler version of DK (namely, no LCD display on board). If it helps, I can arrange for you a remote access to my DK board (with openocd, gdb and minicom, should provide full control over the board) so you could experiment with that.

EtchedPixels commented 1 year ago

I had a look across the two doc sets. I don't see why the phy init is different but if I comment out the ethernet I can happily type at bootdev: etc. Have a 3v3 friendly SD breakout on order and will try moving the \CS GPIO but it looks like that should trivially make it work. If not I guess I'll have to figure out how to port the code to the nucleo64 board I have handy ;)

EtchedPixels commented 1 year ago

So I switched it to use the elf2flt link, put the expected 0x20 bytes lead into the crt0 and did some quick hacks to make it boot on the EK board. It now looks like it's relocating the right things when I compare with objdump output, starts init, prints stuff, runs /bin/sh which explodes complaining arg list too long, so still something wrong but it's looking better

EtchedPixels commented 1 year ago

So there seem to be a load of extra segments that need to go somewhere and I've no idea what all the arm toolchain requires. When linked I end up with

00000020 l d .text 00000000 .text 00002514 l d .dynsym 00000000 .dynsym 00002544 l d .dynstr 00000000 .dynstr 00002548 l d .hash 00000000 .hash 00002560 l d .rel.dyn 00000000 .rel.dyn 00002620 l d .data 00000000 .data 000026dc l d .dynamic 00000000 .dynamic 00002768 l d .bss 00000000 .bss

if I fix the tooling to just cope with the stuff after .data as padding then it gets further (into fsck) but I've simply got no idea what is supposed to be in all these sections and which are needed, not needed, dynamic or shared

pawosm-arm commented 1 year ago

It now looks like it's relocating the right things when I compare with objdump output, starts init, prints stuff, runs /bin/sh which explodes complaining arg list too long, so still something wrong but it's looking better

Ah, I was about to write that I observed exactly that, but then I noticed you've already wrote that. Here's the startup screen:

FUZIX version 0.4pre1
Copyright (c) 1988-2002 by H.F.Bower, D.Braun, S.Nitschke, H.Peraza
Copyright (c) 1997-2001 by Arcady Schekochikhin, Adriano C. R. da Cunha
Copyright (c) 2013-2015 Will Sowerbutts <will@sowerbutts.com>
Copyright (c) 2014-2023 Alan Cox <alan@etchedpixels.co.uk>
Devboot
256kB total RAM, 256kB available to processes (15 processes max)
Enabling interrupts ... ok.
SD drive 0: hda: hda1
bootdev: hda1
Mounting root fs (root_dev=1, ro): OK
Starting /init
init version 0.9.1
/etc/rc: Arg list too long

I remember I wrote some ugly hack years ago to enable passing args to a process, but I don't remember which project it was, hopefully not FUZIX. I'll look into it, hopefully tomorrow.

pawosm-arm commented 1 year ago

So there seem to be a load of extra segments that need to go somewhere and I've no idea what all the arm toolchain requires. When linked I end up with

00000020 l d .text 00000000 .text 00002514 l d .dynsym 00000000 .dynsym 00002544 l d .dynstr 00000000 .dynstr 00002548 l d .hash 00000000 .hash 00002560 l d .rel.dyn 00000000 .rel.dyn 00002620 l d .data 00000000 .data 000026dc l d .dynamic 00000000 .dynamic 00002768 l d .bss 00000000 .bss

if I fix the tooling to just cope with the stuff after .data as padding then it gets further (into fsck) but I've simply got no idea what is supposed to be in all these sections and which are needed, not needed, dynamic or shared

I was advised that those dynamic sections should not be here in the first place. Apparently, for this target, the -no-dynamic-linker flag is ignored in presence of the -pie flag. And since -pie linker flag should be matched with -fpie -DPIE compiler flags, the following change would be needed:

diff --git a/Applications/rules.armm4 b/Applications/rules.armm4
index b93c08f2b..ec0d6e25c 100644
--- a/Applications/rules.armm4
+++ b/Applications/rules.armm4
@@ -4,10 +4,10 @@ ASM = arm-none-eabi-as
 AR = arm-none-eabi-ar
 STRIP = arm-none-eabi-strip
 LINKER = arm-none-eabi-ld
-CFLAGS = -mcpu=cortex-m4 -mtune=cortex-m4 -march=armv7e-m+nofp -mthumb -fpie -DPIE -ffunction-sections -fdata-sections -fno-strict-aliasing -fno-builtin -Wall -g -Os -isystem $(FUZIX_ROOT)/Library/include -isystem $(FUZIX_ROOT)/Library/include/armm4
+CFLAGS = -mcpu=cortex-m4 -mtune=cortex-m4 -march=armv7e-m+nofp -mthumb -ffunction-sections -fdata-sections -fno-strict-aliasing -fno-builtin -Wall -g -Os -isystem $(FUZIX_ROOT)/Library/include -isystem $(FUZIX_ROOT)/Library/include/armm4
 CFLAGS += -DNSOCKET=4

-LINKER_OPT = -L$(FUZIX_ROOT)/Library/libs -lc$(PLATFORM) -pie -static -no-dynamic-linker -z max-page-size=4
+LINKER_OPT = -L$(FUZIX_ROOT)/Library/libs -lc$(PLATFORM) -static -no-dynamic-linker -z max-page-size=4
 LIBGCCDIR = $(dir $(shell $(CC) $(CFLAGS) -print-libgcc-file-name))
 LINKER_OPT += -L$(LIBGCCDIR) -lgcc -T $(FUZIX_ROOT)/Library/elf2flt.ld --no-export-dynamic -Bstatic -no-dynamic-linker
 STRIP_OPT =
diff --git a/Library/libs/Makefile.armm4 b/Library/libs/Makefile.armm4
index 79bbf6c32..aad13332f 100644
--- a/Library/libs/Makefile.armm4
+++ b/Library/libs/Makefile.armm4
@@ -3,7 +3,7 @@ AR = arm-none-eabi-ar
 PLATFORM = armm4
 _PLATFORM = armm0
 export PLATFORM
-CC_OPT = -mcpu=cortex-m4 -mtune=cortex-m4 -march=armv7e-m+nofp -mthumb -fpie -DPIE -ffunction-sections -fdata-sections -fno-strict-aliasing -fno-builtin -Wall -g -Os -c -I../include -I../include/armm4
+CC_OPT = -mcpu=cortex-m4 -mtune=cortex-m4 -march=armv7e-m+nofp -mthumb -ffunction-sections -fdata-sections -fno-strict-aliasing -fno-builtin -Wall -g -Os -c -I../include -I../include/armm4
 ASM_OPT = -g -xassembler-with-cpp -DUSE_AOUT
 # copied in from kernel tree
 KRN_HEADERS = userstructs.h

After this change, those sections do go away, yet the init will cause exception again, so we're in square one.

EtchedPixels commented 1 year ago

Well no - we've eliminated some strange goings on that were clearly interacting with the mystery so that is progress. I'll make the changes and take a look when I get a moment. It's quite possible I've unfixed something else in trying to find this 8)

EtchedPixels commented 1 year ago

Had a chance to look into this a bit more. It seems with those settings the toolchain simply isn't generating any relocations at all. elf2aout was saying 0 relocations and objdump -r seems to agree.

Are all the ARM relocations we will see 32bit little endian ? I am thinking it might be simpler to just accept that the GNU tools break all over the place in these situations and do it the traditional link twice and compare way. I had to do the same to work around the state of the NS32K toolchain

pawosm-arm commented 1 year ago

Although the GNU toolchain supports -mlittle-endian and -mbig-endian for data, the code for Cortex-M should always be in little-endian format. Relocations are described in the Arm ELF spec - part of the ABI for the Arm Architecture - and relate to how an Arm linker binds address info for symbol references from different objects being linked https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst GCC and other Arm compiler toolchains should comply to the ABI. There are sometimes binary incompatibilities (like the ones documented in "Binary Interoperability Between Toolchains - Application Note 487" https://developer.arm.com/documentation/dai0487/latest

EtchedPixels commented 1 year ago

What the spec says is kind of irrelevant if the tool chain doesn't bother outputting the relocations in the first place alas

pawosm-arm commented 1 year ago

What the spec says is kind of irrelevant if the tool chain doesn't bother outputting the relocations in the first place alas

So after dropping -pie, the static linker was able to resolve all of them, so there was no reason to emit those. It looks like they can be kept anyway with -emit-relocs option, namely:

-LINKER_OPT = -L$(FUZIX_ROOT)/Library/libs -lc$(PLATFORM) -pie -static -no-dynamic-linker -z max-page-size=4
+LINKER_OPT = -L$(FUZIX_ROOT)/Library/libs -lc$(PLATFORM) -static -no-dynamic-linker -z max-page-size=4 -emit-relocs

Results in:

$ objdump -r init.debug |less

init.debug:     file format elf32-littlearm

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
0000002a R_ARM_THM_CALL    main
0000002e R_ARM_THM_CALL    exit
00000034 R_ARM_ABS32       __bss_start
00000038 R_ARM_ABS32       __bss_end
0000003c R_ARM_ABS32       environ
00000042 R_ARM_THM_JUMP11  _syscall_ret
00000046 R_ARM_THM_JUMP11  _syscall_ret
0000004a R_ARM_THM_JUMP11  _syscall_ret
0000004e R_ARM_THM_JUMP11  _syscall_ret
00000052 R_ARM_THM_JUMP11  _syscall_ret
00000056 R_ARM_THM_JUMP11  _syscall_ret

...and so on.

EtchedPixels commented 1 year ago

That's really not useful either. It's now generating ABS relocation records for stuff we will need to relocate, and stuff we won't.

At this point I'm going to just leave it broken until after 0.4.

Looking at the output the only relocations we actually get that matter are always 32bit address values, so after 0.4 I'll try linking it twice at differing addresses and doing it by the hand the old way. Same as I had to do because ns32k was (even more) broken.

pawosm-arm commented 1 year ago

Is there a timeline specified for the upcoming release(s)?

EtchedPixels commented 1 year ago

I was hoping next week. I've been building and checking all the ports over the past month and trying to get all the glitches knocked out and make diskimage wherever possible.

EtchedPixels commented 1 year ago

Done more digging and playing with options. What I have now seems to work except for a weird failure in setdate when it does localtime, and an equivalent one in ls -l (yet date works fine). The relocations look right so possibly something else is going on here that needs chasing down further.

Still other obvious problems - signals are not working properly, time at boot is wrong for some reason but it's a bit happier in general

EtchedPixels commented 1 year ago

Blows up in __aeabi_ldivmod for reasons I don't understand. Sometimes hangs, sometimes ends up part way further up the call stack bypassing the code it should return to

pawosm-arm commented 1 year ago

The signals never actually worked, e.g. ctrl+c makes it hang, and even on my YT video documenting the use of network device, I'm ending it with ping command, that I'm never interrupting (just stopped recording the video at that point).

Everything else, seems to work, two places had to be fixed though to make things compile:

diff --git a/Applications/dw/dwdate.c b/Applications/dw/dwdate.c
index 47c72ffe6..59a8ce56a 100644
--- a/Applications/dw/dwdate.c
+++ b/Applications/dw/dwdate.c
@@ -161,10 +161,10 @@ int main( int argc, char *argv[] ){
     /* convert to seconds */
     ret = mul( ret, 60 );
     ret += second;
-
+/*
     if( disflg || !setflg )
        printf( ctime( (time_t *)&ret ) );
-
+*/
     if( setflg ){
        /* This is a sleezy cast */
        x=stime( (time_t *)&ret );
diff --git a/Library/Makefile b/Library/Makefile
index 4dc865984..74cf92cc6 100644
--- a/Library/Makefile
+++ b/Library/Makefile
@@ -164,3 +164,6 @@ ifeq ($(USERCPU),wrx6)
        install -m 0644 ../Kernel/include/userstructs.h include/sys/
        install -m 0644 ../Kernel/include/drivewire.h include/sys/
 endif
+ifeq ($(USERCPU),armm4)
+       install -m 0644 ../Kernel/include/drivewire.h include/sys/
+endif

The dwdate causes linker issue on ctime() (actually it's a problem with getenv() called by ctime()), and the drivewire.h file was already signalled in other places.

This ticket was all about the regression which now seems to be corrected, so I suppose it can be closed now?

EtchedPixels commented 1 year ago

Added something similar. Just fixed the mul and time_t nonsense in dwdate