freemint / m68k-atari-mint-gcc

Fork of GNU's gcc with support for the m68k-atari-mint target
https://github.com/freemint/m68k-atari-mint-gcc/wiki
Other
27 stars 7 forks source link

Switch from SJLJ exceptions to DWARF-2 #23

Closed vinriviere closed 12 months ago

vinriviere commented 1 year ago

SJLJ exceptions slowly become obsolete, in favor to DWARF2 exceptions. m68k-elf and m68k-linux already use DWARF2 exceptions. So we should do the same in the new m68k-atari-mintelf toolchain.

It is stated there that DWARF2 .eh_frame section contains relocations to odd addresses: https://github.com/freemint/m68k-atari-mint-gcc/blob/f2ad8b025485fa3703447ed0d0c7e27d406df338/gcc/config/m68k/mint.h#L77-L79

Is this really true? Why? Is there a way to avoid that? I've just asked to the GCC mailing list. Let's see. https://gcc.gnu.org/pipermail/gcc/2023-September/242414.html

th-otto commented 1 year ago

To be more precise, problematic here are only relocations at odd addresses, not relocations to odd addresses. Relocations to odd addresses will happen also eg with char *array[] = ....

But something like

.byte 0xff
.long somevar

wont work.

mikrosk commented 1 year ago

Thanks for creating / noticing this issue. It would be very unfortunate if we ended up with yet another poorly supported, non-standard and soon-to-be-killed feature.

vinriviere commented 1 year ago

To be more precise, problematic here are only relocations at odd addresses, not relocations to odd addresses.

@th-otto Very true. Thanks for the precision.

And we are lucky, because we got an answer from Andreas Schwab on the GCC mailing list: https://gcc.gnu.org/pipermail/gcc/2023-September/242415.html

He proposes to configure GCC by with DW_EH_PE_aligned instead of DW_EH_PE_absptr in the ASM_PREFERRED_EH_DATA_FORMAT setting. I gave a quick try, and indeed, that solves the relocation alignment issue 😃 So DWARF-2 exceptions become not-impossible for the PRG/ELF format.

However, even configured that way, exceptions don't work out of the box. In my quick test, it failed as soon as the first exception was thrown. Full DWARF-2 exception support has some prerequisites. As I understand, it needs support for crtbegin.o/crtend.o. Maybe also .init/.fini sections. That would require coordinated support in binutils/gcc/MiNTLib.

In conclusion, adding DWARF-2 exception support is quite complex, and impact several components. Not the kind of thing you can easily turn on with a single option. We will have to do that to modernize the mint* targets. But as SJLJ exceptions are not yet completely obsolete, work well, and are and much simpler than DWARF-2 exceptions, I advise to keep SJLJ for now. So we can focus to the stabilization of binutils/gcc/gdb for the m68k-atari-mintelf target. Then we could start the DWARF-2 migration project in a second time.

Also, the switch from SJLJ to DWARF-2 exceptions will be an ABI change. So that will require a full rebuild of all the C++ libraries.

vinriviere commented 1 year ago

A good introduction to difference between SJLJ / DWARF-2 exceptions can be found here. There are pointers to more detailed information. https://gcc.gnu.org/wiki/WindowsGCCImprovements

th-otto commented 1 year ago

Hm, are you sure? I've read that message from Andreas already, and from what i understand, DW_EH_PE_aligned only has an influence about what values are written for relocations, and how they are interpreted when later read, but not where they are written.

We should maybe try to create a testcase first where the default setting of DW_EH_PE_absptr really produces relocations at odd addresses.

About crtbegin/crtend: i have to look, but i don't think that we need them. It's possible that unwind info has to be initialized somehow (in glibc that seems to be register_frame_info), but imho that can be done also at the same place as do_global_ctors_aux is called in libgcc.a. If more is needed for this, that would indeed be a bigger task.

vinriviere commented 1 year ago

Hm, are you sure?

Well, I can just say that I encountered the odd-alignment issue during my early tests with PRG/ELF. Then yesterday, I looked at the m68k-elf relocations with my test program, and there was indeed relocations at odd addresses. Then I used DW_EH_PE_aligned with m68k-atari-mintelf, and there wasn't any odd relocation anymore. Maybe this was just by chance, no idea. Indeed, this should be checked with a proper testcase

vinriviere commented 1 year ago

I've just added usage of crtbegin.o / crtend.o. That's a step forward. GCC: https://github.com/freemint/m68k-atari-mint-gcc/commit/4e3a38a10fe6c4766281d221ba4b9538bb90d8c4 binutils: https://github.com/freemint/m68k-atari-mint-binutils-gdb/commit/40b072271e7e297f65062050434be33ffbd412b9

th-otto commented 1 year ago

The patch to crtstuff.c looks wrong to me. CTOR_LIST/DTOR_LIST are already defined in libgcc2.c. A function to run the global constructors is also already defined there. Also all the the other stuff like __EH_FRAME_BEGIN__ would already be defined there.

That's why i meant that we currently don't not need crtbegin/crtend. They are only for shared libraries.

Changes to binutils are therefor also wrong. They must be usable by any compiler, and not rely on crtbegin/crtend being present.

vinriviere commented 1 year ago

The patch to crtstuff.c looks wrong to me.

No. It works well, as expected.

CTOR_LIST/DTOR_LIST are already defined in libgcc2.c. A function to run the global constructors is also already defined there. Also all the the other stuff like __EH_FRAME_BEGIN__ would already be defined there.

No. They are used by libgcc2.c (compiled as __main.o), but defined elsewhere. In my initial mintelf patch (as well as in the a.out linker) they were defined in the linker script. Now they are defined in crtbegin.o / crtend.o, like any other modern target.

I must admit that, due to the many defines to handle all the special cases, GCC's crtstuff is a big mess.

That's why i meant that we currently don't not need crtbegin/crtend. They are only for shared libraries.

No. See above.

Changes to binutils are therefor also wrong. They must be usable by any compiler, and not rely on crtbegin/crtend being present.

No. They are good. You can even link an empty assembler file with -nostartfiles, and see that the link succeeds even without crtbegin.o / crtend.o. This is possible because the reference to those files is done with a wildcard like *crtbegin.o to make the dependency optional. I haven't invented that, that's just a copy/paste from m68kelf.

Be sure that I carefully test my patches before pushing. Did you test them, or my binaries, before declaring this was wrong?

BTW, if your real intent (that I must guess, as you don't report real-world issues) is to use the mintelf linker script with the a.out linker, that might indeed not work. But for a completely different reason. The original m68kelf linker uses a CONSTRUCTORS statement in the linker script. As it's an a.out-only thing (ignored by ELF), I didn't use it in the mintelf script. But it you really want to use that linker script with the a.out linker, just add that CONSTRUCTORS line to the external linker script m68kmintelf.x then it should work well.

th-otto commented 1 year ago

No. They are used by libgcc2.c (compiled as __main.o), but defined elsewhere.

They are also defined there:

$ m68k-atari-mintelf-nm -A libgcc.a | grep CTOR
libgcc.a:__main.o:         U ___CTOR_LIST__
libgcc.a:_ctors.o:00000008 B ___CTOR_LIST__
$ m68k-atari-mintelf-nm -A libgcc.a | grep global_ctors
libgcc.a:__main.o:00000028 T ___do_global_ctors

Now they are defined in crtbegin.o / crtend.o, like any other modern target.

Your patches using __attribute__((used)) and disabling the error are also an indication that this was the wrong place to define them.

Be sure that I carefully test my patches before pushing. Did you test them, or my binaries, before declaring this was wrong?

No, i did not test them yet. But having the same definition in different files can't be right.

In any case, the function do_global_ctors_aux from crtstuff.c is not run at all. That would only be the case when we have support for an .init section.

BTW, if your real intent (that I must guess, as you don't report real-world issues) is to use the mintelf linker script with the a.out linker, that might indeed not work.

No, that would not work indeed. That toolchain uses different linker scripts for a.out and elf.

th-otto commented 1 year ago

No, i did not test them yet.

Now i did, and it does not work. __CTOR_LIST__ and __DTOR__LIST__ are linked from libgcc.a(_ctors.o), not crtbegin.o. And constructors are not run.

(for testing i use

#include <stdio.h>
#include <mintbind.h>

void __attribute__((__constructor__)) cons(void)
{
    Cconws("C constructor\r\n");
}

void __attribute__((__destructor__)) dest(void)
{
    Cconws("C destructor\r\n");
}

int main(void)
{
    return 0;
}

It doesn't make much sense anyway. If we activate DWARF2 unwind info, then __EH_FRAME_BEGIN__ will also be defined in libgcc(__main.o).

vinriviere commented 1 year ago

They are also defined there:


$ m68k-atari-mintelf-nm -A libgcc.a | grep CTOR
...
libgcc.a:_ctors.o:00000008 B ___CTOR_LIST__

This is a hack from the GCC team: https://github.com/freemint/m68k-atari-mint-gcc/blob/gcc-13-mintelf/libgcc/libgcc2.c#L2434-L2455 The ___CTOR_LIST__ is defined in the BSS segment, just to satisfy the linker, as a fallback. But in normal operation, that libgcc.a:_ctors.o should never be linked at all.

Proof is that I successfully compiled and run both a standard C++ program, and also an empty .S file with -nostartfiles.

Your patches using __attribute__((used)) and disabling the error are also an indication that this was the wrong place to define them.

No. I added that __attribute__((used)) to silence a warning. This fix was already present on other variables. I added it on a separate line to make the patch easier to rebase. In the same spirit, I also commented out the #error in that file.

I needed those workarounds because it's very uncommon to use crtbegin.o / crtend.o without support for the .init section. But that's a temporary situation. I will remove those workarounds as soon as the .init section is usable in mintelf. Then we won't use any uncommon configuration at all. Just progress step by step.

In any case, the function do_global_ctors_aux from crtstuff.c is not run at all. That would only be the case when we have support for an .init section.

Yes. That will be a further step.

Now i did, and it does not work. __CTOR_LIST and DTORLIST are linked from libgcc.a(_ctors.o), not crtbegin.o. And constructors are not run.

This is unexpected. Did you also update GCC? I have changed the specs to provide crtbegin.o and crtend.o on the command line. Particularly: m68k-atari-mintelf-gcc -dumpspecs, look for startfile and endfile.

Both gcc and binutils must be updated at the same time. I've explicitly added a constraint in the Ubuntu packages.

th-otto commented 1 year ago

This is unexpected. Did you also update GCC?

Yes, but i failed to realize that one of the patches did not apply cleanly, and the line that changed the STARTFILE_SPEC was rejected, so crtbegin.o was not used in the link. Apologies.

The #error only triggers because we have neither an .init nor an .init_array, but only need the definitions for the constructors. Apparently the source was not prepared for that, although exactly such a scenario is mentioned in libgcc2.c.

After i fixed that, constructors do work now. So in general, using crtbegin/crtend seems to be ok, but i still did a few more changes (in my own branch only for now), that makes sure that only one of the two __CTOR_LIST__ is compiled. In fact, using EH_FRAME_LIST and its end symbol only seem to work when using crtbegin, since no attempt is made in the linker script to link them in the right order (for ctors, that is done by linking the .ctors section from crtbegin first, then all others).

I've also protected all the differences by ifdefs now. Maybe not strictly neccessary for the elf toolchain, but makes it easier for me to sync the sources.

Now i have to do the same checks again, using the a.out toolchain. Sigh.

th-otto commented 1 year ago

I've just pushed a patch that makes it possible to enable dwarf2 exceptions at configure time (default is still sjlj). It also changes the ASM_PREFERRED_EH_DATA_FORMAT as suggested by Andreas. Then i did some tests, using this sample:

#include <stdio.h>

  class exception
  {
  public:
    exception() throw() { }
    virtual ~exception() throw();
    virtual const char* what() const throw();
  };

exception::~exception() throw() { }
const char* exception::what() const throw() { return "hello"; }

void func_exception(int i)
{
    printf("in %s\n", __func__);
    if (i == 5)
        throw exception();
}

int main(void)
{
    int i;

    printf("Hello, C++\n");
    try {
        for (i = 0; i < 10; i++)
            func_exception(i);
    } catch (const exception &e)
    {
        printf("got exception %d %s\n", i, e.what());
    }
    return 0;
}

With the old setting of ASM_PREFERRED_EH_DATA_FORMAT = DW_EH_PE_absptr, i get a link error because of an relocation at an odd address. With the changed setting, i don't get that error, but i get a message

libstdc++.a(eh_terminate.o)(.eh_frame); no .eh_frame_hdr table will be created

instead. The link still succeeds with error status 0. If i run the program, the exception is not catched, and the program exits with a exit value of 1536 instead.

There are also other strange differences. With the old setting, the generated assembler source contains .cfi_startproc, .cfi_endproc and .cfi_offset statements, even without -g. It also does not have any .eh_frame section at all.

th-otto commented 1 year ago

I found the cause for the link error. In my fork i don't include m68kelf.h, and for ASM_OUTPUT_ALIGNED the definition from m68k.h was used, which just emits .even. But for the dwarf2 and exception tables sometimes an alignment to a 4-byte address is needed. I think this branch is not affected by this, but it has be taken into account when inclusion of m68kelf.h is removed here, too (which is still not done, but needed to avoid the other ABI changes).

The exit-code 1536 is from an abort() call in unwind-pe.h in read_encoded_value_with_base(), so there still seems to be something wrong in the generated tables.

vinriviere commented 1 year ago

With the changed setting, i don't get that error, but i get a message

libstdc++.a(eh_terminate.o)(.eh_frame); no .eh_frame_hdr table will be created

I don't remember having seen any warning. And certainly not an error, as the link succeeded. But at runtime, exceptions didn't work as expected. I don't remember actual message, something like invalid system call, and maybe a bus error at address 0 in the ARAnyM's log. Not completely surprising, as the startup code was incomplete.

I found the cause for the link error. In my fork i don't include m68kelf.h, and for ASM_OUTPUT_ALIGNED

Ah. Fine.

FYI, I'm preparing support for .init_array / .fini_array in binutils / gcc / mintlib. It's really straightforward, and that's the modern way to go. We will see if that's enough to get DWARF-2 exceptions working, or if there is yet another issue behind.

th-otto commented 1 year ago

What for? Those arrays just contain the same pointer list as __CTOR_LIST__/__DTOR_LIST. Only difference is that mintlib will be responsible to call them, and that can only be achieved by changing the startup code, and making it much more complicated. As long as we don't have shared libraries, we don't need them.

vinriviere commented 1 year ago

.init_array / .fini_array is the modern way to go. That's just another way to add hooks at startup. Concretely, crtinit.o use it to initialize the .eh_frame section.

th-otto commented 1 year ago

??? That section is readonly, what do you want to initialize there? .init_array has nothing to do with exception handling.

And yes, it is modern, because needed for shared libs. Other than that, it is exactly the same as all the .ctors sections, just without the leading and trailing elements.

vinriviere commented 1 year ago

FYI, I'm preparing support for .init_array / .fini_array in binutils / gcc / mintlib.

Done, I've pushed my work for the above (NOT the DWARF-2 experiments). binutils: https://github.com/freemint/m68k-atari-mint-binutils-gdb/commit/13147f4ea76d450ee1c8b8b629a5f92b2d290f31 gcc: https://github.com/freemint/m68k-atari-mint-gcc/commit/44322d44e59ead3c0c9aa46a2b26d0daa283af70 mintlib: https://github.com/freemint/mintlib/commit/40afd6971539697ab00aefc590bb9697b63e5442

Ubuntu binaries are currently building.

As expected, now constructors use the .init_array section (instead of .ctors) and destructors use the .fini_array section (instead of .dtors). Also, the main() function in user programs doesn't implicitly call __main() anymore at the beginning. Any program expecting .init_array/.fini_array sections, starting by GCC itself, will work out of the box.

Key point is that modern ELF systems use .init_array/.fini_array for intitialization. For dynamic libraries, yes, but also for static libraries and executables. Combined with crtbegin.o/crtend.o support, the mintelf target is now closer to the Linux behaviour. So regarding to GCC's mess of #ifdef's, we now use a much more standard ELF configuration, keeping away from obscure and poorly tested code paths. Finally, that will require less patches. Some of them could already be removed. So that's a step forward.

Anyway, my initial motivation for this .init_array stuff was the ability to easily enable DWARF-2 exceptions and their .eh_frame stuff. Combined with @th-otto's --disable-sjlj-exceptions, it's very convenient. Everything compiles well and produce good-looking binaries.

Alas, it still fails like previously. As soon as the program throws an exception, it displays Bad System Call and exits immediately. I can see BAD SYSTEM CALL: User PC=0 in the ARAnyM log, so I guess that's something related to a NULL function pointer. I will have a closer look to what happens during initialization. Our brand new gdb will be of great help.

th-otto commented 1 year ago

I still don't see why this should put us forward. Especially it is a step backward regarding exception handling, since the initial call to __register_frame_info https://github.com/freemint/m68k-atari-mint-gcc/blob/44322d44e59ead3c0c9aa46a2b26d0daa283af70/libgcc/libgcc2.c#L2393-L2398 is now not done anymore.

It is also now impossible to omit that additional code when you know that you don't need it, by providing a dummy __main() function. This is done by several small tools, and often also in projects that use libcmini.

I also don't see that this saves any patches. Quite contrary, it needs additional patches in other projects.

MiNT is not linux, and we don't should not pretend to be like it. And btw, init_array is not used at all by linux, which uses the .init section instead.

Apart from that, your patch in gcc is misplaced, where it is only done for cross-compilers. If at all, it must go into config.gcc.

But really, i would ask to revert that. It is not going to help us in any way.

th-otto commented 1 year ago

Well my above comments where not completely right, there is an entry made into the .init_array that calls frame_dummy which in turn calls __register_frame_info. But there are still problems with this approach. First off, it also calls register_tm_clone, which in turn generates references to _ITM_registerTMCloneTable from libitm.a, something that is definitely not needed, and the library does not even exist.

And it will also pull in all the exception handling stuff, even for C-compiled code that does not need it.

th-otto commented 1 year ago

I've pushed fixes for this to gcc and mintlib now. But i'm still not happy with this. Amongst others: in your previous commit we got rid of the __CTOR_LIST__ etc. symbols in the linker script. But now we add back __init_array_start etc. again. Bad thing about this is, that those symbol names depend on the compiler generating an underscore or not. And those linker scripts are the only place where this is the case, all other tools are independent of this.

The binutils should be independent of the compiler. We might be using different compilers lke gcc-4.6.4, gcc-7, gcc-13 etc, but all use the same binutils. One solution to this would be to PROVIDE those symbols in both flavours.

vinriviere commented 1 year ago

Well my above comments where not completely right, there is an entry made into the .init_array that calls frame_dummy which in turn calls __register_frame_info.

This was my initial motivation.

But there are still problems with this approach. First off, it also calls register_tm_clone, which in turn generates references to _ITM_registerTMCloneTable from libitm.a, something that is definitely not needed, and the library does not even exist.

Indeed, I noticed that TM clone stuff in the sources. I saw on the web this was related to threading/atomic stuff, so most likely useless for our platform. It seems that most embedded platflorms use --disable-tm-clone-registry, so that should also be a solution for us.

And it will also pull in all the exception handling stuff, even for C-compiled code that does not need it.

Ah, this is very bad. BTW, I wonder why this is different from the __main() solution. I would have said that such difference could come when switching from SJLJ to DWARF-2.

Anyway, I haven't strong opinions. If, after in-depth tests, it appears that the old method is clearly better, then go for it. We have seen that it's easy to configure gcc for one or other method. However, I'm keen to keep the new method for some time, to give us the time familiarize with the new method, and see precisely what is better or worse. When time comes to make a final choice, we will know why, regarding to pros and cons.

th-otto commented 1 year ago

Ah, this is very bad. BTW, I wonder why this is different from the __main() solution.

In libgcc2.c that function is also called, but only when the target defines EH_FRAME_SECTION_NAME, which is not the case.

to give us the time familiarize with the new method

Yes, i'll keep that for now. Switching the methods is a pain, because you have to rebuild & install all of binutils, mintlib and gcc. But whatever method is choosen, we'll have to find a way to move the call to register_frame_info to a place where it is only used by the g++ driver. Maybe by adding an object with a static constructor to libcstd++.a, but we have to make sure it is the first one being called.

vinriviere commented 1 year ago

Amongst others: in your previous commit we got rid of the __CTOR_LIST__ etc. symbols in the linker script.

Yes, it's the benefit of using crtbegin.o.

But now we add back __init_array_start etc. again. Bad thing about this is, that those symbol names depend on the compiler generating an underscore or not. And those linker scripts are the only place where this is the case, all other tools are independent of this.

True. And if you want my 2 cents, the GNU people should also have used crtbegin.o/crtend.o for those __init_array_start labels.

The binutils should be independent of the compiler. We might be using different compilers lke gcc-4.6.4, gcc-7, gcc-13 etc, but all use the same binutils.

Using the same ld for different gcc versions, yes. But I wouldn't have imagined using the same ld binary for both the mint and mintelf gcc. Or you mean using the same ld for ELF-underscore and ELF-not-underscore variants? As a big fan of minimalistic patches, I wouldn't have imagined that.

One solution to this would be to PROVIDE those symbols in both flavours.

Yes, that's a solution. Or provide aliases in C files, like EmuTOS.

If you want other ideas: the linker script is generated by a shell script called genscripts.sh. The standard ELF template is completely unreadable, due to the countless variants. This is why I chose to write a new template for scratch for PRG/ELF. But it might be easy (or not) to instantiate that template twice, with underscore or not. Then using some kind of parameter or environment variable to to switch between a linker script or the other.

Maybe by adding an object with a static constructor to libcstd++.a, but we have to make sure it is the first one being called.

No problem. The .init_array scheme has support for priorities. And if that's not enough, there is also .preinit_array.

th-otto commented 1 year ago

But I wouldn't have imagined using the same ld binary for both the mint and mintelf gcc.

That actually might work, if you tell m68k-atari-mintelf to use the m68kmint (old a.out-mintprg) emulation. But that would be strange.

Or you mean using the same ld for ELF-underscore and ELF-not-underscore variants?

Yes, this is what i had in mind. Not that i'm a fan of supporting both (we still have to finally decide which one we use), we certainly don't want to confuse people. But it might take some time until we have backported the change to old compilers. And that should certainly be done for atleast 4.6.4 (for EmuTOS) and 7.5.0 (currently used for freemint). Certainly you'll need binutils >= 2.41 when you want to make use of the new format, but that should then also be usable by other compilers.

As a big fan of minimalistic patches, I wouldn't have imagined that.

Its only a few lines more in the linker script, which is completely new anyway. No other patches to existing code.

Yes, that's a solution. Or provide aliases in C files, like EmuTOS.

In Emutos that was only needed for functions from libgcc. But it would be strange to require such "hacks" in every project. In fact, in most cases only a few symbols are affected, the newly introduced __init_array etc that are now referenced by mintlib, and etext() that was already referenced before.

But it might be easy (or not) to instantiate that template twice, with underscore or not. Then using some kind of parameter or environment variable to to switch between a linker script or the other.

That would require different, non-standard compiler switches depending on which compiler is used. Not a real option imho, and difficult to understand by casual users why this is needed.

th-otto commented 1 year ago

Our brand new gdb will be of great help.

I've tracked it down to _Unwind_Find_FDE returning a NULL pointer, which should not happen. I also wonder whether code like https://github.com/freemint/m68k-atari-mint-gcc/blob/5557aef9f0c953df0a9b60245fc58ad51aca27f2/libgcc/unwind-dw2.c#L184-L185 will eventually crash with address-error on 68k.

Also dubios: https://github.com/freemint/m68k-atari-mint-gcc/blob/5557aef9f0c953df0a9b60245fc58ad51aca27f2/libgcc/unwind-dw2.c#L193-L194 that assumes that 8 bytes fit into an unsigned long.

th-otto commented 1 year ago

And it will also pull in all the exception handling stuff, even for C-compiled code that does not need it.

Looks that this is not entirely the case. I looked at the object, which has references to __register_frame_info. But in a linked c-program, the code looks like

frame_dummy:
[000001c0] 4e56 0000                 link       a6,#0
[000001c4] 203c 0000 0000            move.l     #$00000000,d0
[000001ca] 6714                      beq.s      $000001E0
[000001cc] 4879 0001 aa86            pea.l      object.0
[000001d2] 4879 0001 a466            pea.l      __FRAME_END__
[000001d8] 4eb9 0000 0000            jsr        $00000000
[000001de] 508f                      addq.l     #8,a7
[000001e0] 4e71                      nop
[000001e2] 4e5e                      unlk       a6
[000001e4] 4e75                      rts

Note the $00000000 where the reference to __register_frame_info has been. This is apparently caused by the weak declaration (i thought that such thing won't work, but apparently they do). So the code above will be linked in, and the static object.0 variable, but not the other exception handling stuff. Still not optimal, but i think for now we can live with it.

For the tm_clone support i've pushed a fix to avoid pulling that in.

th-otto commented 1 year ago

[000001c0] 4e56 0000 link a6,#0 [000001c4] 203c 0000 0000 move.l #$00000000,d0 [000001ca] 6714 beq.s $000001E0 [000001cc] 4879 0001 aa86 pea.l object.0

Aargs. I got fooled by the disassembly. Yes, the value at that instruction is zero. But it is initially an absolute reference to a function, and therefor relocated. That has the effect that jsr $00000000 will jump to the start of the text segment. Huks. I guess this has to be fixed in the linker.

th-otto commented 1 year ago

Fixed in https://github.com/freemint/m68k-atari-mint-binutils-gdb/commit/b46acf928ecbe2ac1c74d569e42bfb1cb6f3c254

It is possible, that a similar bug is also in the old a.out-mintprg routine.

Edit: hm, no apparently not. Maybe the reloc type is different there?

vinriviere commented 1 year ago

Wow, I've just compiled gcc with --disable-sjlj-exceptions, and my C++ test program just works! @th-otto, is there anything to add?

th-otto commented 1 year ago

Hu? You mean one that throws exceptions? Mine still aborts.

vinriviere commented 1 year ago

Yes, it was my simple program to test SJLJ exceptions (and also .init_array). It didn't work with DWARF-2 exceptions. Previously, there was that strange "Bad System Call" issue with User PC=0. Now it just works : the exception is thrown and caught as expected. There is a .eh_frame section as expected. I will trace the program with gdb to see how it looks.

th-otto commented 1 year ago

Hm, i recompiled your kernel too, and get the same result as with mine: when the exception is thrown, the program is aborted, because the unwinder does not find the frame. .eh_frame is present and registered.

vinriviere commented 1 year ago

Do you mean you compiled GCC with --disable-sjlj-exceptions and also full support for .init_array? It is required to initialize the frame_dummy. I will look closely to my working binary.

vinriviere commented 1 year ago

Apologies @th-otto, I messed the file names during my quick test yesterday. Actually, it doesn't work better.

vinriviere commented 1 year ago

From what I understand, it fails inside uw_init_context_1(). More precisely, here: https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/libgcc/unwind-dw2.c#L1347

This immediately branches to a trap #7. I find that instruction here, no idea if this is related or not: https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/config/m68k/m68k.md#L6626-L6631

Anyway, init_dwarf_reg_size_table() ends up with a call to: https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/libgcc/unwind-dw2.c#L1315-L1319

Then it might come to: https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/builtins.cc#L7910-L7912

And finally go there: https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/dwarf2cfi.cc#L313

I can just say that init_dwarf_reg_size_table() produces a trap #7, other quotes above are just speculation. But that can be a good starting point for further investigation.

th-otto commented 1 year ago

Apologies @th-otto, I messed the file names during my quick test yesterday. Actually, it doesn't work better.

No problem. Can happen during testing ;) Too bad, i already thought that i could have done similar mistakes and i was looking for a bug that does not exist...

The trap #7 is emitted by a call to __builtin_trap IIRC.

But your error seems to be different than the one i get. In my case, using the test attached below, it runs into

https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/libgcc/unwind-dw2.c#L1005-L1006

That call does not find the frame and returns NULL, and uw_frame_state_for will return _URC_END_OF_STACK This will trigger an assertion in https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/libgcc/unwind-dw2.c#L1335-L1336

PS: can you attach the test that you are using?

throw.cc.txt

th-otto commented 1 year ago

__builtin_init_dwarf_reg_size_table works for me, and produces:

        link.w %fp,#0
        move.b #4,dwarf_reg_size_table
        move.b #4,dwarf_reg_size_table+1
        move.b #4,dwarf_reg_size_table+2
        move.b #4,dwarf_reg_size_table+3
        move.b #4,dwarf_reg_size_table+4
        move.b #4,dwarf_reg_size_table+5
        move.b #4,dwarf_reg_size_table+6
        move.b #4,dwarf_reg_size_table+7
        move.b #4,dwarf_reg_size_table+8
        move.b #4,dwarf_reg_size_table+9
        move.b #4,dwarf_reg_size_table+10
        move.b #4,dwarf_reg_size_table+11
        move.b #4,dwarf_reg_size_table+12
        move.b #4,dwarf_reg_size_table+13
        move.b #4,dwarf_reg_size_table+14
        move.b #4,dwarf_reg_size_table+15
        move.b #12,dwarf_reg_size_table+16
        move.b #12,dwarf_reg_size_table+17
        move.b #12,dwarf_reg_size_table+18
        move.b #12,dwarf_reg_size_table+19
        move.b #12,dwarf_reg_size_table+20
        move.b #12,dwarf_reg_size_table+21
        move.b #12,dwarf_reg_size_table+22
        move.b #12,dwarf_reg_size_table+23
        move.b #4,dwarf_reg_size_table+24
        move.b #4,dwarf_reg_size_table+25
        moveq #0,%d0
        unlk %fp
        rts

Btw. there might be another difference caused by include m68kelf.h. In m68k.h, DEBUGGER_REGNO is defined to https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/config/m68k/m68k.h#L713

but then later redefined to https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/config/m68k/m68kelf.h#L104

vinriviere commented 1 year ago

PS: can you attach the test that you are using?

I will go get it tonight. Just a single throw, and a few other unrelated things. I guess it can be simplified.

__builtin_init_dwarf_reg_size_table works for me, and produces:

I loaded the program into Steem/MonST2 yesterday, and I indeed saw similar code: a series of moves to absolute addresses. Then somewhere at the end there was a trap #7. Maybe another routine, just after. I will look again carefully and report. But for sure, by looking at the disassembly and the sources, the trap #7 position matched the call to init_dwarf_reg_size_table(). Also, I can't completely exclude a branch to a wrong offset. But that trap #7 certainly arrived there intentionally. That's really obscure.

vinriviere commented 1 year ago

Here is the simple testcase. Nothing fancy. "Bad System Call" on throw. ex.cc.txt

vinriviere commented 1 year ago

__builtin_init_dwarf_reg_size_table works for me, and produces:

Well, for me it's like this. The end is different, and there's a trap #7 after that function. For memory, its the assembler output of libgcc/unwind-dw2.c.

       move.b #12,dwarf_reg_size_table+23
        move.b #4,dwarf_reg_size_table+24
        move.b #4,dwarf_reg_size_table+25
        .loc 1 1319 1 is_stmt 0 view .LVU3216
        jra .L669
.L670:
.LBE1181:
.LBE1180:
        .loc 1 1336 3 discriminator 1 view .LVU3217
        trap #7
.LFE65:
        .size   uw_init_context_1, .-uw_init_context_1

Maybe I misconfigured something, I don't know. There are many trap #7 in that file. I see on the web that's a way to exit the process with some debuggers. Maybe it's related to gcc_assert(). No idea.

th-otto commented 1 year ago

Maybe I misconfigured something, I don't know.

I have a jsr abort there instead of a trap, but that code is apparently only for the gcc_assert in that function (after the moves, there is an unconditional jbra)

Other than that, your test behaves like mine, and aborts because the exception frame is not found.

th-otto commented 1 year ago

Hm, i've configured your compiler now also to use dwarf2 exceptions. I hope i didn't mess up anything, but currently i get:

    d692:       13fc 0004 0002  moveb #4,2cd7a <object.0+0x4a>
    d698:       cd7a 
    d69a:       13fc 0004 0002  moveb #4,2cd7b <object.0+0x4b>
    d6a0:       cd7b 
    d6a2:       6000 fede       braw d582 <__divsi3+0x21e6>
    d6a6:       4eb9 0000 012a  jsr 12a <abort>

so it also emits a call to abort, not a trap.

vinriviere commented 1 year ago

Another enlightenment:

uw.c:

unsigned char dwarf_reg_size_table[100];

void f(void)
{
        __builtin_init_dwarf_reg_size_table (dwarf_reg_size_table);
}

m68k-atari-mintelf-gcc -Os -S uw.c -o -

#NO_APP
    .file   "uw.c"
    .text
    .align  2
    .globl  f
    .type   f, @function
f:
    move.b #4,dwarf_reg_size_table
    move.b #4,dwarf_reg_size_table+1
    move.b #4,dwarf_reg_size_table+2
    move.b #4,dwarf_reg_size_table+3
    move.b #4,dwarf_reg_size_table+4
    move.b #4,dwarf_reg_size_table+5
    move.b #4,dwarf_reg_size_table+6
    move.b #4,dwarf_reg_size_table+7
    move.b #4,dwarf_reg_size_table+8
    move.b #4,dwarf_reg_size_table+9
    move.b #4,dwarf_reg_size_table+10
    move.b #4,dwarf_reg_size_table+11
    move.b #4,dwarf_reg_size_table+12
    move.b #4,dwarf_reg_size_table+13
    move.b #4,dwarf_reg_size_table+14
    move.b #4,dwarf_reg_size_table+15
    move.b #12,dwarf_reg_size_table+16
    move.b #12,dwarf_reg_size_table+17
    move.b #12,dwarf_reg_size_table+18
    move.b #12,dwarf_reg_size_table+19
    move.b #12,dwarf_reg_size_table+20
    move.b #12,dwarf_reg_size_table+21
    move.b #12,dwarf_reg_size_table+22
    move.b #12,dwarf_reg_size_table+23
    move.b #4,dwarf_reg_size_table+24
    move.b #4,dwarf_reg_size_table+25
    rts
    .size   f, .-f
    .globl  dwarf_reg_size_table
    .section    .bss
    .type   dwarf_reg_size_table, @object
    .size   dwarf_reg_size_table, 100
dwarf_reg_size_table:
    .zero   100
    .ident  "GCC: (GCC MiNT ELF) 13.2.0"

No trap #7 here. ??? The problem is elsewhere (but not very far).

vinriviere commented 1 year ago

That trap #7 is produced by gcc_abort(). I can see it from the preprocessed output when compiling libgcc's unwind-dw2.c.

More precisely, I can see actual macros by compiling that file with E -dM:

#define gcc_assert(EXPR) ((void)(!(EXPR) ? abort (), 0 : 0))
#define abort() __builtin_trap ()

Other useful option is -E -fdirectives-only to see where macros come from.

https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/tsystem.h#L121-L123 https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/tsystem.h#L61-L63

Next step will be to understand why uw_frame_state_for() doesn't return _URC_NO_REASON.

vinriviere commented 1 year ago

I added traces. The return value is _URC_END_OF_STACK and comes from here (at the end): https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/libgcc/unwind-dw2.c#L1005-L1017

Next question: Why frame unwind info is not found?

th-otto commented 1 year ago

https://github.com/freemint/m68k-atari-mint-gcc/blob/cf877a4b39b640421d7a3e03f73dbaad5307313d/gcc/tsystem.h#L61-L63

Ah ok. That block is inside an #ifdef inhibit_libc. So that difference is because you configure gcc without an installed mintlib. If you use the headers from mintlib instead, then the macro will not be defined, and abort() be called instead, like for me. But we need mintlib anyway, because atexit() is called to run the destructors.

I added traces. The return value is _URC_END_OF_STACK and comes from here (at the end):

Yes, i found the same location above.

Next step will be to understand why uw_frame_state_for() doesn't return _URC_NO_REASON.

That is where i'm currently stuck, because i don't understand yet what _Unwind_Find_FDE does.

mikrosk commented 1 year ago

(watching with interest)

Stupid question: why do we have to solve / investigate all of this stuff? I mean, how come that DWARF2 exceptions work on mint-elf and other m68k ELF targets? Is it matter of support in libc? If so, can't we just look how it is solved in newlib or uClibC or some other m68k libc implementation?