frno7 / iopmod

PlayStation 2 input/output processor (IOP) modules and tools.
13 stars 2 forks source link

Modules break with compiler optimisation greater than -O0 #1

Open frno7 opened 5 years ago

frno7 commented 5 years ago

https://github.com/frno7/iopmod/blob/13732bffa01dd29f3289a5a340a2ff08401557d3/Makefile#L16

It is presently unclear why IOP modules don’t quite work with the -O2 compiler optimisation option.

uyjulian commented 3 years ago

There are some additional workarounds for GCC >=5.3.0 in ps2sdk: https://github.com/ps2dev/ps2sdk/blob/master/iop/Rules.make#L33

Additionally, you can use -Q --help=optimizers for ex. -O2 to see which flags are enabled by -O2. You can then do a binary search to see which optimization breaks it, and disable that optimization (e.g. -fno-whatever)

frno7 commented 3 years ago

Oh, that’s great, @uyjulian! I fear that 100+ (or so) reboots are needed to find the error so anything that can cut that number down is very much welcome. A lot of stuff could have happened between GCC 5.3 and the presently used 10.2 too, but with a bit of guidance and luck, one might find it quickly.

rickgaiser commented 3 years ago

Offtopic, but at ps2dev / ps2sdk we've recently migrated the EE compiler to gcc 10.2. There's some interesting patches that might be usefull for you as well (one of them is even yours ;): https://github.com/ps2dev/gcc/tree/ee-v10.2.0 https://github.com/ps2dev/binutils-gdb/tree/ee-v2.35.1

Note that we found a bug in sqrt.s in binutils. It's claimed to be a standard MIPSII instruction, but it's not. This was also something that worked with -O0, becouse then the function sqrtf() is called. But any optimization level higher would cause gcc to replace it with a faulty sqrt.s instruction.

frno7 commented 3 years ago

Nice, the short loop bug fix by frno7 has a bit of background in the post MIPS: Fix GCC noreorder for undefined R5900 short loops to the GCC patches mailing list. Some more work is needed to merge it. I’ve made a special r5900check tool that can find all occurrences of the short loop bug in general ELF files, including hand-coded assembly (of which the Linux kernel has some).

I’ve also made a write-up of the R5900 short loop erratum on the wiki, including porting the comment from the depracted GCC branch to GAS mainline (where it belongs), for handy reference in case anybody needs it explained. Likewise with the peculiarities of the R5900 floating point unit (FPU), etc. When I started out I found it difficult get information on this kind of stuff.

GCC and Binutils have a couple of more patches that are somewhat more relevant for Linux, like the new -mfix-r5900 option.

Have you submitted the SQRT fix to Binutils?

Incidentally, I fixed another instruction bug in m68k/Bintuils a week ago, in commit 3b288c8e2ed35 m68k: Require m68020up rather than m68000up for CHK.L instruction. :)

By the way, how’s it going with the R5900 MMI auto-vectorisation?

uyjulian commented 3 years ago

I fear that 100+ (or so) reboots are needed to find the error so anything that can cut that number down is very much welcome.

If the issue occurs with -O1, you can binary search using the enabled flags from -O1. Otherwise, you can remove -O1 flags from -O2.

Binary search is faster than going through each of them one by one. Best case scenario is that you only have to try 7 times (for a total of 94 flags to search through).

frno7 commented 3 years ago

Yes, I agree that’s a good plan, @uyjulian. :)

rickgaiser commented 3 years ago

Have you submitted the SQRT fix to Binutils?

Not yet, but I plan to.

By the way, how’s it going with the R5900 MMI auto-vectorisation?

Not much further than last time we spoke. I think I'll finish the MMI support by disabling MIPS-MSA. Rewriting both MIPS-MSA and R5900-MMI into a more generic form is too much for me to do alone, so I will not be submitting this to gcc.

Back on topic: I've tried a simple hello world irx module using gcc9 (from @uyjulian ) and the ps2dev environment (code here). It also breaks when using anything other than -O0. The issue is in the macro's used for the import tables. It's a mixed bag of C structs with inline assembly: https://github.com/ps2dev/ps2sdk/blob/4a2d422a1824cd2374a41726884d9b470015aa2c/iop/kernel/include/irx.h#L55-L72 gcc will reorder these, causing the import tables to become invalid.

But iopmod does not seem to be using this. I've taken a closer look at how you're creating the import tables, and wow; it's an amazing piece of art. I can learn a thing or 2 from there. But as for the import tables, you seem to be using assembly only: https://github.com/frno7/iopmod/blob/master/include/iopmod/module-symbol.S So I don't think gcc is reordering those.

uyjulian commented 3 years ago

I've tried a simple hello world irx module using gcc9 (from @uyjulian ) and the ps2dev environment (code here). It also breaks when using anything other than -O0.

Which specific -f optimization breaks it? If it's -ftoplevel-reorder, I think that was already disabled when compiled imports.o and exports.o, so it may be something else.

frno7 commented 3 years ago

@rickgaiser, I’m glad you found it inspirational! I made an effort to put as much logic as possible into the tooling, so that writing the actual IOP modules becomes effortless. This means that a module function declaration such as

https://github.com/frno7/iopmod/blob/a5d2360a4f8a09f6fcf4dc58f6c87bcd1f926bc8/include/iopmod/module/printk.h#L17

serves quadruple-duty: the id_ macro is used to produce both import and export directives, automatically, and the rest of the declaration is used by the C compiler as usual. The standard IOP modules are declared in the same way, for example sifcmd. The fourth duty is to display exports and imports, symbolically, in the iopmod-info tool, used to inspect IRX module files.

I wrote a small C language lexer in tool/lexc.c, which is actually complete, and a simple ELF parser in tool/elf.c for the tooling.

The GNU linker option --gc-sections ensures that only the relevant pieces of code are linked into the final module, which saves space without any manual effort. There is also a special automatic _module_init

https://github.com/frno7/iopmod/blob/a5d2360a4f8a09f6fcf4dc58f6c87bcd1f926bc8/include/iopmod/mod.h#L30-L42

which sets up the exports automatically, so modules don’t do that themselves. It then calls _module_start, which is declared with module_init in the module, as in for example

https://github.com/frno7/iopmod/blob/a5d2360a4f8a09f6fcf4dc58f6c87bcd1f926bc8/module/irqrelay.c#L305

Modules that don’t initialise anything don’t need to declare a module_init at all, so that simplification works too. The module/printk.c module for example only exports a function that prints in the Linux kernel log, similar to the kernel’s own printk.

The module/irqrelay.c module on the other hand sets up an RPC service, for relaying interrupts, so it needs the initialisation, but since it doesn’t export any functions its header file

https://github.com/frno7/iopmod/blob/a5d2360a4f8a09f6fcf4dc58f6c87bcd1f926bc8/include/iopmod/module/irqrelay.h#L1-L3

is correspondingly simple, having only to declare the identification and the version of the module.

frno7 commented 3 years ago

@rickgaiser, one more thing: The Linux kernel itself also understands IRX file module export and import directives, which means that the Linux kernel automatically prelinks any IOP modules from ROM and from files, as necessary, without manual effort, when an IOP module is requested with iop_module_request:

/**
 * iop_module_request - link requested IOP module unless it is already linked
 * @name: name of requested module
 * @version: requested version in BCD, where major must match with a least
 *  the same minor
 * @arg: module arguments or %NULL
 *
 * Module library dependencies are resolved and prelinked as necessary. Module
 * files are handled as firmware by the IOP module linker.
 *
 * IOP module link requests are only permitted if the major versions match
 * and the version is at least of the same minor as the requested version.
 *
 * Context: mutex
 * Return: 0 on success, otherwise a negative error number
 */
int iop_module_request(const char *name, int version, const char *arg)

https://github.com/frno7/linux/blob/00a55c7e3a39f56e0e703a02dda5f52075b8f46a/drivers/ps2/iop-module-request.c#L856-L872

frno7 commented 3 years ago

@rickgaiser, for the fun of it I ported your Hello world application in commit 149740e86b3e. It should work like your own, and with PS2DEV. Here’s what the iopmod-info tool says about it:

% tool/iopmod-info module/hello.irx
iopmod id addr      0xb0
iopmod id name      hello
iopmod id version   0x0100
iopmod entry addr   0x0
iopmod unknown addr 0xd0
iopmod text size    172
iopmod data size    0
iopmod bss size     0
iopmod version      0x0100
iopmod name     hello
import library  0x0102  stdio
import  0    4  printf  stdio_printf

Notice that printf is actually an (undeclared but linkable) alias for stdio_printf, which in IOPMOD terms is considered to be its proper name by the declaration:

https://github.com/frno7/iopmod/blob/a5d2360a4f8a09f6fcf4dc58f6c87bcd1f926bc8/include/iopmod/module/stdio.h#L6-L7

The disassembly with mipsr5900el-unknown-linux-gnu-objdump -dr module/hello.irx reveals _module_init that would do the exports, had there been anything to export, and its invocation of _module_start:

00000000 <_module_init>:
   0:   27bdffe8    addiu   sp,sp,-24
   4:   afbf0014    sw  ra,20(sp)
   8:   afa40018    sw  a0,24(sp)
   c:   afa5001c    sw  a1,28(sp)
  10:   8fa5001c    lw  a1,28(sp)
  14:   8fa40018    lw  a0,24(sp)
  18:   3c020000    lui v0,0x0
            18: R_MIPS_HI16 _module_start
  1c:   24420038    addiu   v0,v0,56
            1c: R_MIPS_LO16 _module_start
  20:   0040f809    jalr    v0
  24:   00000000    nop
  28:   8fbf0014    lw  ra,20(sp)
  2c:   27bd0018    addiu   sp,sp,24
  30:   03e00008    jr  ra
  34:   00000000    nop

00000038 <_module_start>:
  38:   27bdffe8    addiu   sp,sp,-24
  3c:   afbf0014    sw  ra,20(sp)
  40:   afa40018    sw  a0,24(sp)
  44:   afa5001c    sw  a1,28(sp)
  48:   3c020000    lui v0,0x0
            48: R_MIPS_HI16 .rodata
  4c:   244400c0    addiu   a0,v0,192
            4c: R_MIPS_LO16 .rodata
  50:   3c020000    lui v0,0x0
            50: R_MIPS_HI16 printf
  54:   2442009c    addiu   v0,v0,156
            54: R_MIPS_LO16 printf
  58:   0040f809    jalr    v0
  5c:   00000000    nop
  60:   24020001    li  v0,1
  64:   8fbf0014    lw  ra,20(sp)
  68:   27bd0018    addiu   sp,sp,24
  6c:   03e00008    jr  ra
  70:   00000000    nop
  74:   afa40000    sw  a0,0(sp)
  78:   afa50004    sw  a1,4(sp)
  7c:   00001025    move    v0,zero
  80:   03e00008    jr  ra
  84:   00000000    nop
  88:   41e00000    0x41e00000
  8c:   00000000    nop
  90:   00000102    srl zero,zero,0x4
  94:   69647473    0x69647473
  98:   0000006f    0x6f

0000009c <printf>:
  9c:   03e00008    jr  ra
  a0:   24000004    li  zero,4
    ...