freemint / m68k-atari-mint-gcc

Fork of GNU's gcc with support for the m68k-atari-mint target
https://github.com/freemint/m68k-atari-mint-gcc/wiki
Other
26 stars 7 forks source link

ASM_RETURN_CASE_JUMP / ASM_OUTPUT_CASE_LABEL / ASM_OUTPUT_BEFORE_CASE_LABEL for jump tables #34

Closed vinriviere closed 10 months ago

vinriviere commented 11 months ago

Question is how to configure ASM_OUTPUT_BEFORE_CASE_LABEL in GCC. We already discussed that, I open this issue for reference and documentation.

A jump table is a data table used for gcc to efficiently compile big switch statements. This particularly occurs when there are many contiguous case statements. Offsets of different case labels are stored inside an array indexed by the switch value.

Testcase: swi.c

int g;

void f(int i)
{
    switch (i)
    {
        case 0: g = 6082; break;
        case 1: g = 9332; break;
        case 2: g = 5642; break;
        case 3: g = 1423; break;
        case 4: g = 2152; break;
        case 5: g = 6779; break;
        case 6: g = 7074; break;
        case 7: g = 8280; break;
    }
}

There are 2 issues:

1) Where to put the jump table. As it is read-only data, it should go into the .rodata section. On the other hand, if the jump table is located in the .text section just after the index computation instructions, it can benefit of compact pc-relative addressing. Drawback of .text location is that it wastes some space in the instruction cache (only available on advanced processors).

2) The SVR4 ABI requires a special directive .swbeg before the actual jump table, when put inside the .text section. It serves 2 purposes:

.swbeg insert 2 words:

So when .swbeg is used, the jump instruction just before it must be adjusted to take in account that extra offset.

Reference SVR4 documentation: http://www.bitsavers.org/pdf/motorola/sysv/System_V_68_Assembler_Users_Guide_Nov1984.pdf

gas/gdb instruction definition for .swbeg: https://github.com/freemint/m68k-atari-mint-binutils-gdb/blob/90d470c102e6f79a8625df90706109849404d86c/opcodes/m68k-opc.c#L2179-L2186

Next I'm going to check what gcc does in various configurations.

vinriviere commented 11 months ago

Official GCC documentation for ASM_OUTPUT_CASE_LABEL: https://gcc.gnu.org/onlinedocs/gccint/Dispatch-Tables.html#index-ASM_005fOUTPUT_005fCASE_005fLABEL

First, a thing that I didn't notice initially: https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/m68k/m68k.md#L5809-L5856

Here is where ASM_OUTPUT_CASE_LABEL is used: https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/final.cc#L2444-L2450

So if ASM_OUTPUT_CASE_LABEL is used, then that macro is used to output the special case label. Otherwise, it is output as normal label.

There is no default in m68k.h. So by default, a normal label is used.

But there are overrides. elfos.h introduces ASM_OUTPUT_BEFORE_CASE_LABEL. Important: ASM_OUTPUT_ALIGN parameter is a power of 2! So 2 means 4-byte. https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/elfos.h#L133-L156

https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/m68k/linux.h#L91-L113

https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/m68k/netbsd-elf.h#L124-L148

https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/m68k/m68kelf.h#L52-L70

So it seems that m68k-linux, m68k-elf and m68k-netbsd use the same settings. No one uses the SVR4 ABI anymore?

vinriviere commented 11 months ago

Finally, the testcase from the initial post with various compilers:

$ m68k-linux-gcc -Os -S swi.c -o -
#NO_APP
    .file   "swi.c"
    .text
    .align  2
    .globl  f
    .type   f, @function
f:
    move.l 4(%sp),%d0
    moveq #7,%d1
    cmp.l %d0,%d1
    jcs .L1
    add.l %d0,%d0
    move.w .L4(%pc,%d0.l),%d0
    jmp %pc@(2,%d0:w)
    .balignw 2,0x284c
.L4:
    .word .L11-.L4
    .word .L10-.L4
    .word .L9-.L4
    .word .L8-.L4
    .word .L7-.L4
    .word .L6-.L4
    .word .L5-.L4
    .word .L3-.L4
.L11:
    move.l #6082,g
.L1:
    rts
.L10:
    move.l #9332,g
    jra .L1
.L9:
    move.l #5642,g
    jra .L1
.L8:
    move.l #1423,g
    jra .L1
.L7:
    move.l #2152,g
    jra .L1
.L6:
    move.l #6779,g
    jra .L1
.L5:
    move.l #7074,g
    jra .L1
.L3:
    move.l #8280,g
    jra .L1
    .size   f, .-f
    .globl  g
    .section    .bss
    .align  2
    .type   g, @object
    .size   g, 4
g:
    .zero   4
    .ident  "GCC: (GCC MiNT ELF 20230819) 13.2.0"
    .section    .note.GNU-stack,"",@progbits

The jump table is in the .text section, just after the jump instruction. There is no .swbeg instruction. The jump table is 2-byte aligned with a move.l a4,a4 filler, which is never inserted as the opcodes are always aligned. So that's optimal (except filling the instruction cache).

m68k-elf is almost identical, but does use the .swbeg directive, for the sake of the SVR4 ABI. So that's a waste of 4 bytes.

$ m68k-elf-gcc -Os -S swi.c -o -
#NO_APP
    .file   "swi.c"
    .text
    .align  2
    .globl  f
    .type   f, @function
f:
    move.l 4(%sp),%d0
    moveq #7,%d1
    cmp.l %d0,%d1
    jcs .L1
    add.l %d0,%d0
    move.w .L4(%pc,%d0.l),%d0
    jmp %pc@(2,%d0:w)
    .balignw 2,0x284c
    .swbeg  &8
.L4:
    .word .L11-.L4
    .word .L10-.L4
    .word .L9-.L4
    .word .L8-.L4
    .word .L7-.L4
    .word .L6-.L4
    .word .L5-.L4
    .word .L3-.L4
.L11:
    move.l #6082,g
.L1:
    rts
.L10:
    move.l #9332,g
    jra .L1
.L9:
    move.l #5642,g
    jra .L1
.L8:
    move.l #1423,g
    jra .L1
.L7:
    move.l #2152,g
    jra .L1
.L6:
    move.l #6779,g
    jra .L1
.L5:
    move.l #7074,g
    jra .L1
.L3:
    move.l #8280,g
    jra .L1
    .size   f, .-f
    .globl  g
    .section    .bss
    .align  2
    .type   g, @object
    .size   g, 4
g:
    .zero   4
    .ident  "GCC: (GCC MiNT ELF 20230819) 13.2.0"
vinriviere commented 11 months ago

Current m68k-atari-mintelf fixed by @th-otto. This does the job. No .swbeg, just like m68k-linux. As we agreed that .swbeg is completely useless for us.

$ m68k-atari-mintelf-gcc -Os -S swi.c -o -
#NO_APP
    .file   "swi.c"
    .text
    .align  2
    .globl  f
    .type   f, @function
f:
    move.l 4(%sp),%d0
    moveq #7,%d1
    cmp.l %d0,%d1
    jcs .L1
    add.l %d0,%d0
    move.w .L4(%pc,%d0.l),%d0
    jmp %pc@(2,%d0:w)
    .balignw 2,0x284c
.L4:
    .word .L11-.L4
    .word .L10-.L4
    .word .L9-.L4
    .word .L8-.L4
    .word .L7-.L4
    .word .L6-.L4
    .word .L5-.L4
    .word .L3-.L4
.L11:
    move.l #6082,g
.L1:
    rts
.L10:
    move.l #9332,g
    jra .L1
.L9:
    move.l #5642,g
    jra .L1
.L8:
    move.l #1423,g
    jra .L1
.L7:
    move.l #2152,g
    jra .L1
.L6:
    move.l #6779,g
    jra .L1
.L5:
    move.l #7074,g
    jra .L1
.L3:
    move.l #8280,g
    jra .L1
    .size   f, .-f
    .globl  g
    .section    .bss
    .align  2
    .type   g, @object
    .size   g, 4
g:
    .zero   4
    .ident  "GCC: (GCC MiNT ELF 20231027) 13.2.0"
vinriviere commented 11 months ago

@th-otto: I have rewoked the ASM_RETURN_CASE_JUMP patch to make mint.h look more like linux.h: https://github.com/freemint/m68k-atari-mint-gcc/commit/f6f7f11429b09de9510fc73fbc8159530f2ac6fc. This doesn't change the ABI (still no .swbeg) and should not cause trouble to your targets. It can easily be verified using the provided testcase. By making everything explicit, this should help to ultimately get rid of m68kelf.h.

th-otto commented 11 months ago

Why did you remove #undef ASM_OUTPUT_BEFORE_CASE_LABEL? It is not the swbeg directive that causes trouble, but if this macro is not defined, it will be defined in elfos.h to align to 4 bytes. That might emit a zero value before the table, making it misaligned with the jump in https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/m68k/m68kelf.h#L69.

Where to put the jump table. As it is read-only data, it should go into the .rodata section.

No definitely not. It must immediately follow the jmp statement above.

vinriviere commented 11 months ago

Where to put the jump table. As it is read-only data, it should go into the .rodata section.

No definitely not. It must immediately follow the jmp statement above.

.rodata section is for read-only data. Any offset table is read-only data. So jump tables should theoretically go to .rodata, like any data. But that's a non-topic, as we all know that it would produce less efficient code, table being potentially far from the jump. And as you notice, this wouldn't be compatible with actual jump code. Be sure we completely agree.

Why did you remove undef ASM_OUTPUT_BEFORE_CASE_LABEL?

I did it to have the same configuration as linux.h, as we have the same goal: word jump tables in text section, without swbeg, just after the jump. Similar configurations will ease upgrading to new gcc versions. Of course I carefully checked that the result was correct (not changed).

[still about ASM_OUTPUT_BEFORE_CASE_LABEL] if this macro is not defined, it will be defined in elfos.h to align to 4 bytes.

Yes, but it isn't used. ASM_OUTPUT_BEFORE_CASE_LABEL is used only inside ASM_OUTPUT_CASE_LABEL definition inside elfos.h: https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/config/elfos.h#L149-L156

And now mint.h, just like linux.h, uses #undef ASM_OUTPUT_CASE_LABEL. So ASM_OUTPUT_BEFORE_CASE_LABEL isn't used anymore.

That might emit a zero value before the table, making it misaligned with the jump

I remember that you had trouble with that, and that I also had similar trouble with old gcc 4.

With the testcase from the top of this thread:

$ m68k-atari-mintelf-gcc -Os -c swi.c
$ m68k-atari-mintelf-objdump -d swi.o

swi.o:     file format elf32-m68k

Disassembly of section .text:

00000000 <f>:
   0:   202f 0004       movel %sp@(4),%d0
   4:   7207            moveq #7,%d1
   6:   b280            cmpl %d0,%d1
   8:   6524            bcss 2e <f+0x2e>
   a:   d080            addl %d0,%d0
   c:   303b 0806       movew %pc@(14 <f+0x14>,%d0:l),%d0
  10:   4efb 0002       jmp %pc@(14 <f+0x14>,%d0:w)
  14:   0010 001c       orib #28,%a0@
  18:   0028 0034 0040  orib #52,%a0@(64)
  1e:   004c            .short 0x004c
  20:   0058 0064       oriw #100,%a0@+
  24:   23fc 0000 17c2  movel #6082,0 <f>
  2a:   0000 0000 
  2e:   4e75            rts
  30:   23fc 0000 2474  movel #9332,0 <f>
  36:   0000 0000 
  3a:   60f2            bras 2e <f+0x2e>
...

Sorry for the ugly disassembly. We can see the jump at offset 0x10. It is immediately followed by 8 short offsets, as expected. Then by a move.l, as first case. Note that the table is at offset $14, not a multiple of 4. This code works well, I traced it with MonST2.

Maybe there was another special case in the code that caused you trouble with wrong table alignment? Any idea for another testcase?

th-otto commented 11 months ago

Maybe there was another special case in the code that caused you trouble with wrong table alignment?

Yes, because in the example above, the first word of the jumptable happens to be at a 4byte aligned address (0x14). You would need to produce an example where that is not the case.

Any idea for another testcase?

When i encountered that issue, it was some table in the unwinding code. But i'm quite sure there are easier cases to reproduce it. But i'm quite sure that that it is still needed, because i had to backport it also for older gcc versions.

vinriviere commented 11 months ago

Note that the table is at offset $14, not a multiple of 4.

D'oh! My bad. I've been fooled by decimal. $14 is of course a multiple of 4.

Yes, because in the example above, the first word of the jumptable happens to be at a 4byte aligned address (0x14). You would need to produce an example where that is not the case.

I will do.

vinriviere commented 11 months ago

This was easy. I just had to add a nop to change the alignment.

swi.c

int g;

void f(int i)
{
    asm("nop");
    switch (i)
    {
        case 0: g = 6082; break;
        case 1: g = 9332; break;
        case 2: g = 5642; break;
        case 3: g = 1423; break;
        case 4: g = 2152; break;
        case 5: g = 6779; break;
        case 6: g = 7074; break;
        case 7: g = 8280; break;
    }
}
$ m68k-atari-mintelf-gcc -Os -S swi.c -o -
#NO_APP
    .file   "swi.c"
    .text
    .align  2
    .globl  f
    .type   f, @function
f:
    move.l 4(%sp),%d0
#APP
| 5 "swi.c" 1
    nop
| 0 "" 2
#NO_APP
    moveq #7,%d1
    cmp.l %d0,%d1
    jcs .L1
    add.l %d0,%d0
    move.w .L4(%pc,%d0.l),%d0
    jmp %pc@(2,%d0:w)
    .balignw 2,0x284c
.L4:
    .word .L11-.L4
    .word .L10-.L4
...
$ m68k-atari-mintelf-gcc -Os -c swi.c
$ m68k-atari-mintelf-objdump -d swi.o

swi.o:     file format elf32-m68k

Disassembly of section .text:

00000000 <f>:
   0:   202f 0004       movel %sp@(4),%d0
   4:   4e71            nop
   6:   7207            moveq #7,%d1
   8:   b280            cmpl %d0,%d1
   a:   6524            bcss 30 <f+0x30>
   c:   d080            addl %d0,%d0
   e:   303b 0806       movew %pc@(16 <f+0x16>,%d0:l),%d0
  12:   4efb 0002       jmp %pc@(16 <f+0x16>,%d0:w)
  16:   0010 001c       orib #28,%a0@
  1a:   0028 0034 0040  orib #52,%a0@(64)
  20:   004c            .short 0x004c
  22:   0058 0064       oriw #100,%a0@+
  26:   23fc 0000 17c2  movel #6082,0 <f>
  2c:   0000 0000 
  30:   4e75            rts
  32:   23fc 0000 2474  movel #9332,0 <f>
  38:   0000 0000 
  3c:   60f2            bras 30 <f+0x30>
...

This time the jump table is really at an address non-multiple of 4: 0x16. But it's still correct. No additional filler in the table.

vinriviere commented 11 months ago

Phew! I finally managed to reproduce @th-otto's issue. But I needed to add both -mlong-jump-table-offsets and -malign-int.

Key point is the code in the ASM_RETURN_CASE_JUMP macro. It uses instructions like jmp %%pc@(2,%0:w). Note that the offset "2" is hardcoded. That's a nonsense, because it should normally be the label of the first jump table entry. I guess this has been done that way because that macro has no way to know the name of the jump table label. As a consequence, the jump table must immediately follow that jmp, otherwise the offset will be wrong.

By default, the generated code is like this:

       move.w .L4(%pc,%d0.l),%d0
        jmp %pc@(2,%d0:w)
        .balignw 2,0x284c
.L4:
        .word .L11-.L4
        .word .L10-.L4

That's good. Even if the .balignw 2 is useless, as 68k instructions are always aligned on 2.

But:

then a filler would be inserted, causing the jmp offset being wrong.

I managed to produce such situation that way. Due to alignment randomness, I needed to use the non-nop testcase.

$ m68k-atari-mintelf-gcc -Os -S -o - swi.c -mlong-jump-table-offsets -malign-int
...
        move.l .L4(%pc,%d0.l),%d0
        jmp %pc@(2,%d0:l)
        .balignw 4,0x284c
.L4:
        .long .L11-.L4
        .long .L10-.L4
...

When misaligned, it produces the following disassembly (wrong, as expected):

  16:   203b 0808       movel %pc@(20 <f+0x20>,%d0:l),%d0
  1a:   4efb 0802       jmp %pc@(1e <f+0x1e>,%d0:l)
  1e:   284c            moveal %a4,%a4
  20:   0000 0020       orib #32,%d0
  24:   0000 002c       orib #44,%d0

Of course the movea.l a4,a4 is a nonsense: never run, but moving the jump table too far.

This only happens when 2 conditions are met:

This looks a bit far-fetched. I guess that @th-otto's real-world testcase used different options, but with the same result.

Then question is where that .balignw comes from. Long story short: it's about the ADDR_VEC_ALIGN macro. When not defined, it defaults to BIGGEST_ALIGNMENT

https://github.com/freemint/m68k-atari-mint-gcc/blob/cb42ec3d119f6b2f02f5abb282397977dbae07f0/gcc/final.cc#L465-L478

And that BIGGEST_ALIGNMENT depends if -malign-int is used or not.

Solution: Force ADDR_VEC_ALIGN to 0: https://github.com/freemint/m68k-atari-mint-gcc/commit/67b1e5ec6dc4eaf70cf95807f03aa252ad499262

I pushed that: no more .balignw, no more potential bug.

    move.l .L4(%pc,%d0.l),%d0
    jmp %pc@(2,%d0:l)
.L4:
    .long .L11-.L4
    .long .L10-.L4

Crazy thing is that m68k-elf and m68k-linux suffers of the same bug! So this is now fixed for the mintelf target, but remains a problem for other m68k targets. I will open an issue in GCC Bugzilla for that.

EDIT: Done. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112413

vinriviere commented 10 months ago

Summary:

Maybe the GCC team will officially fix that bug, or keep it opened for decades. Anyway, the ADDR_VEC_ALIGN trick works well and safely, and keeps us out of trouble. So I propose to close this issue.

mikrosk commented 10 months ago

Nice job, one more issue down. Btw thanks to your work on the mintelf target I have found a bug in gcc, too: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112712. :)

vinriviere commented 9 months ago

The GCC team has fixed the bug on the master branch for the upcoming GCC 14. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=eea25179d8d1406685b8b0995ba841605f895283;hp=f5fc001a84a7dbb942a6252b3162dd38b4aae311

mikrosk commented 9 months ago

"No plans to backport." ... pff. So what, do we do that by ourselves or just upgrade to 14.0 when released?

th-otto commented 9 months ago

We can also backport it ourselves. But i would still stick to the settings that make sure that no nop is inserted there, since that is a total waste.

What would be more interesting, is to have a function attribute for long-jump-table-offsets instead of using a global switch, since that is really rarely needed (only when the function is >32K, i've only encountered that so far in the z88dk parser)