freemint / m68k-atari-mint-binutils-gdb

Fork of sourceware's binutils-gdb with support for the m68k-atari-mint target.
https://github.com/freemint/m68k-atari-mint-binutils-gdb/wiki
GNU General Public License v2.0
10 stars 4 forks source link

Assertions with new linker #7

Open th-otto opened 1 year ago

th-otto commented 1 year ago

When trying to link dxx-rebirth, i get some assertions currently:

ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763

The assertion is from https://github.com/freemint/m68k-atari-mint-binutils-gdb/blob/47b6a8d55e4bfbf8592ae2694aa8ac0274c4abb4/bfd/elf32-atariprg.c#L763

The reason is that there seem to be two or more relocations for the same address (so diff == 0 and not > 0). Any idea how this can happen?

BTW @mikrosk : how did you manage to compile dxx-rebirth with our gcc? I had several issues with it, due to missing physfs which i had to port first, end because the Sconstruct configuration was not able to locate libpng etc. for a cross-compiler. Also at some places, it tried to include <GL/gl.h> even if i disable use of opengl.

Edit: also strange: although the BFD_ASSERT triggers, the linker still exits with code 0?

th-otto commented 1 year ago

A fix for the exit code has been pushed. But the actual problem still remains.

th-otto commented 1 year ago

Apropos relocations: what about the relocations in debug info? From GEMDOS point of view, they are part of the symbols, and therefor not loaded. Wouldn't that mean we have to apply them when gdb loads the debug info?

th-otto commented 1 year ago

Added some debug printfs, but only found the following (first value is r_offset from the entry, 2nd is calculated relocation address):

reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000014 003f2130
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000028 003f2144
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000001bc 003f22d8
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000318 003f2434
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000324 003f2440
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000003a4 003f24c0
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000450 003f256c
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000006b4 003f27d0
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000900 003f2a1c
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000978 003f2a94
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000009b8 003f2ad4
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000009dc 003f2af8
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000a2c 003f2b48
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000a7c 003f2b98
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000b4c 003f2c68
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000c54 003f2d70
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000c98 003f2db4
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000ce8 003f2e04
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000d24 003f2e40

reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000014 003f2e20
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000028 003f2e34
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000034 003f2e40
**build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: duplicate relocation at 0x003f2e40**
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000000d0 003f2edc
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000160 003f2f6c
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 0000016c 003f2f78
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000001c0 003f2fcc
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000001f4 003f3000

Comparing that to output of objdump -r, it looks like both values are from .eh_frame sections:

build/similar/3d/.d1x-rebirth.interp.o:     file format elf32-m68k

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET   TYPE              VALUE
00000014 R_68K_32          __gxx_personality_v0
00000028 R_68K_32          .text
000001bc R_68K_32          .text+0x000001e6
00000318 R_68K_32          .text.unlikely._ZN16interpreter_base10op_defaultEjPKh
00000324 R_68K_32          .gcc_except_table._ZN16interpreter_base10op_defaultEjPKh
000003a4 R_68K_32          .text+0x0000037c
00000450 R_68K_32          .text+0x00000520
000006b4 R_68K_32          .text+0x00000922
00000900 R_68K_32          .text+0x00000c96
00000978 R_68K_32          .text+0x00001144
000009b8 R_68K_32          .text+0x0000116e
000009dc R_68K_32          .text+0x0000118c
00000a2c R_68K_32          .text+0x000011e2
00000a7c R_68K_32          .text.unlikely._ZN19partial_range_error6reportILm256EEEvPKcjS2_S2_mjm
00000a88 R_68K_32          .gcc_except_table._ZN19partial_range_error6reportILm256EEEvPKcjS2_S2_mjm
00000b4c R_68K_32          .text+0x0000121e
00000c54 R_68K_32          .text+0x000012f8
00000c98 R_68K_32          .text+0x00001720
00000ce8 R_68K_32          .text+0x0000178c
00000d24 R_68K_32          .text+0x00001bce

build/similar/arch/sdl/.d1x-rebirth.digi_audio.o:     file format elf32-m68k

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET   TYPE              VALUE
00000014 R_68K_32          __gxx_personality_v0
00000028 R_68K_32          .text
00000034 R_68K_32          .gcc_except_table
000000d0 R_68K_32          .text+0x0000013e
00000160 R_68K_32          .text+0x00000286
0000016c R_68K_32          .gcc_except_table+0x00000013
000001c0 R_68K_32          .text+0x00000416
000001f4 R_68K_32          .text+0x0000047c

Any idea what's going wrong?

Edit: forgot to mention: compiler was configured to use dwarf exception unwind info.

Edit2: however the assertion also triggers with Vincents toolchain.

vinriviere commented 1 year ago

The reason is that there seem to be two or more relocations for the same address (so diff == 0 and not > 0). Any idea how this can happen?

Oh, strange. I used those assertions much, for situations which shouldn't happen in the real world. It's generally rather obscure in the linker code to determine if a situation could happen or not. But if it does happen, then of course the right fix is to add a runtime check. As you did.

But I still wonder how it can happen. Normally, each address in loaded sections should only be loaded once.

Apropos relocations: what about the relocations in debug info? From GEMDOS point of view, they are part of the symbols, and therefor not loaded. Wouldn't that mean we have to apply them when gdb loads the debug info?

The sections are already relocated by gdb when the PRG is attached to the process. I added specific code for that, like in gdb 5.x. I guess that also applies to DWARF-2 sections, but this must be checked.

th-otto commented 1 year ago

I added specific code for that, like in gdb 5.x. I guess that also applies to DWARF-2 sections, but this must be checked.

Hm, i will have to check that. But i doubt that can work; the value that has to be added for absolute relocations is that of the real text start in physical memory, something that gdb does not know about. I guess GDB only adds the VMA for that address. Especially interesting when the program is run by the gdbserver, but the debug information is loaded by the gdb running on the host.

But currently the issue with duplicate relocations is more important, because it causes link errors. I get the impression that it might also have to do with weak symbols, something that seems to be generated quite a lot from c++ templates.

vinriviere commented 1 year ago

Hm, i will have to check that. But i doubt that can work; the value that has to be added for absolute relocations is that of the real text start in physical memory, something that gdb does not know about.

The relocation code is in mintelf_nat_target::child_initialize(): https://github.com/freemint/m68k-atari-mint-binutils-gdb/blob/40817f3d95f50069bee650643504652601dc55a0/gdb/mintelf-nat.c#L113

It calls ptrace (PT_BASEPAGE) to get the process the basepage address, then calls objfile_relocate() to do the actual relocation job. From my tests, that worked well.

th-otto commented 1 year ago

BTW, i get similar duplicate relocation errors when trying to compile a native gcc for m68k (with --target=m68k-atari-mintelf and --host=m68k-atari-mintelf)

th-otto commented 1 year ago

I was able to reproduce it with a much simpler testcase:

#include <stdio.h>

void func_exception(int i)
{
        printf("in %s\n", __func__);
        if (i == 5)
                throw(i);
}

int main(void)
{
        int i;

        setvbuf(stderr, NULL, _IONBF, 0);
        printf("Hello, C++\n");
        try {
                for (i = 0; i < 10; i++)
                        func_exception(i);
        } catch (...)
        {
                printf("got exception\n");
        }
        return 0;
}

This will give one duplicate, and by adding some printfs, i sse that they are from pmem_type_info.o and pbase_type_info.o (both from libsupc++.a). My new theory: The problem is due to https://github.com/freemint/m68k-atari-mint-binutils-gdb/blob/3ebda8ccd3e49f567ea71538ece8bde17d61a7b1/bfd/elf-eh-frame.c#L1319-L1324. This will merge 2 CIE records in the output, but the TPA relocations for both inputs have already been added. Maybe adding the tpa relocations in m68k_elf32_atariprg_relocate_section is done too early, and must be postponed until the final link, when it is known that the input section is really used?

th-otto commented 1 year ago

YES. This exactly was the reason. Fix for this: use _bfd_elf_section_offset to compute the offset of the relocation in the section, which takes that into account. Fix is pushed.

And the best thing about this: it also seems to fix https://github.com/freemint/m68k-atari-mint-gcc/issues/23 . This makes sense, if the tpa relocation is applied to the wrong offset.

vinriviere commented 1 year ago

Aha @th-otto, excellent 😃 It had been a pain to get correct TPA relocations even for the DATA segment. The mixture of input_section / output_section / vma / output_offset is really obscure. After struggling, my code seemed to be correct for any section. But you discovered the last trap: those special values -1 and -2 for section offset in the .eh_frame section!!

Really, it's incredible to see how it's complicated to add custom features such as TPA relocations to the linker. There are so much special cases. But this time, it's OK for .eh_frame sections. This should be enough for most usages. Until next issue.

th-otto commented 1 year ago

Yes, that was really a pain. Hopefully this time it will cover all cases, as that function seems to be supposed to handle all special cases. Until GNU people decide to rework that ugly hack...

Actually i found that when adding debug output to m68k_elf32_atariprg_relocate_section, printing only the section-relative offset, and comparing that to objdump -r (after linking with --emit-relocs). Those relative offsets should be identical, and actually this was the case for all sections but .eh_frame.