Kingcom / armips

An assembler for various ARM and MIPS platforms. Builds available at http://buildbot.orphis.net/armips/
MIT License
363 stars 77 forks source link

Fix encoding of MIPS break, disallow negative arguments for syscall #124

Closed sp1187 closed 6 years ago

sp1187 commented 7 years ago

Fixes #122.

Kingcom commented 7 years ago

Is this documented anywhere? I've seen retail code in a PS2 game passing an immediate to break in a way that is consistent with the current implementation, i.e. in the lower bits. IDA also disassmbles it that way with a single argument. So the bug may actually be inside gas, or it's just fully up to the implementation.

sp1187 commented 7 years ago

It has been quite hard to find actual documentation (as in not code) for this, but section 8.6 in See MIPS Run has a table claiming that the code field in break is indeed 10 bits and in the upper half of the instruction. And even with this change, you can still control the lower bits with the second argument, so break 0, 7 does the same as the old break 7.

seemipsrun_break

Kingcom commented 7 years ago

On the other hand, this reference only mentions a single field: http://www.cs.cmu.edu/afs/cs/academic/class/15740-f97/public/doc/mips-isa.pdf I'm not convinced it's the best thing to change the single parameter version for that reason, especially as existing code may depend on it.

queueRAM commented 7 years ago

Since I am the originator of #122, I figured I'd weigh in with what I've been able to ascertain.

As indicated in the MIPS IV ISA Revision 3.2, the break instruction has bits 6-25 set aside for the code field. The code field is not handled by hardware at all. In fact, if the exception handler wants to read what the value of code is, it will first need to fetch the address of the exception, then read the instruction word from memory at this address and mask and shift out the code field. This means the interpretation of the code field is entirely up to the software.

See MIPS Run defines it as one 10-bit field in the most significant bits. For one parameter, GNU binutils assembles it this way as well. Even the Nintendo SDK assembler (GNU as 2.6) assembles break 0xc into 000c000d. If an optional second parameter is provided, GNU as inserts it into the least significant 10-bits of the code field:

$ echo "break 0xc; break 0xa, 0x5" | mips64-elf-as -o break.o - && mips64-elf-objdump -d break.o
<snip>
00000000 <.text>:
   0:   000c000d        break   0xc
   4:   000a014d        break   0xa,0x5

This is also how the capstone engine (based off of the LLVM) disassembles the data:

$ echo 000c000d000a014d | xxd -r -p > break.bin && rasm2 -a mips -e -D -Bf break.bin
0x00000000   4                 000c000d  break 0xc
0x00000004   4                 000a014d  break 0xa, 5

So which is correct? As far as I can tell, it is entirely up to the assembler and system programmer authors. Beyond See MIPS Run, I haven't seen a formal definition in writing. If it were me, I'd side with with GNU binutil's and LLVM implementation.

unknownbrackets commented 7 years ago

Not sure if it matters, but PSP games often end up with:

000001cd  break 7 or proposed break 0, 7
0000000d  break 0

I don't think I've seen it as the upper bits, which may imply that CodeWarrior (PS2, PS1, PSP, right?) didn't separate it into two 10 bit chunks.

What's most commonly seen in N64 games? (sorry originally said DS which is ARM of course.)

-[Unknown]

sp1187 commented 6 years ago

It appears that the consensus here is that the behaviour of break and syscall will stay at it is for compatibility reasons, so I will close this issue.