CensoredUsername / dynasm-rs

A dynasm-like tool for rust.
https://censoredusername.github.io/dynasm-rs/language/index.html
Mozilla Public License 2.0
723 stars 54 forks source link

movabs incorrect? #108

Open mlgiraud opened 1 week ago

mlgiraud commented 1 week ago

Hi, im currently writing a jit compiler and need to load a 64 bit immediate value into a register. The only way to do this in one instruction afaik is by using the mov encoding movabs rax, imm64. This however currently emits code that should correspond to something like movabs rax, [imm64], i.e. it tries to load from the immediate address. These encodings exist, but should be generated when writing movabs rax, [imm64] imho.

The language spec in the documentation says:

movabs al, imm64
movabs ax, imm64
movabs eax, imm64
movabs imm64, al
movabs imm64, ax
movabs imm64, eax
movabs imm64, rax
movabs rax, imm64

Which should probably be

movabs al, [imm64]
movabs ax, [imm64]
movabs eax, [imm64]
movabs [imm64], al
movabs [imm64], ax
movabs [imm64], eax
movabs [imm64], rax
movabs rax, [imm64]

movabs reg64, imm64 <---- This is missing

The spec in the documentation says that mov reg64, imm64 can be used, but this results in an error (or the truncation of imm64 to a 32 bit value if you leave the type up to the compiler with as _). I think this should be moved to movabs reg64, imm64.

EDIT: So i misinterpreted the spec, and it actually is possible to encode movabs reg64, imm64 by providing mov rax, QWORD immediate as _, but i think the syntax should be corrected to align with other assembly tools.

CensoredUsername commented 1 week ago

The correct way to do that is indeed to do mov reg, QWORD imm.

movabs isn't actually an opcode in x64. For some reason at&t style uses it, but intel/nasm don't. But the issue is that in intel style assembly there isn't an explicit way to denote a 64-bit displacement in a memory reference. This has resulted in several assemblers using different ways to denote this. Dynamic assemblers like dynasm-rs or Luajit cannot just look at the value, so they use alternative assembler mnemnonics. Luajit uses mov64, I picked movabs (because the operation is move to/from absolute address). This is explicitly stated in the documentation.

Yes, this does cause some confusion with AT&T style movabs, but considering that's a whole other dialect from the intel/nasm style we use here, I don't find that the biggest problem.

mlgiraud commented 1 week ago

Yeah i see the reasoning and agree that it is probably nicer to use the mov reg, QWORD imm syntax. However, i would still suggest changing the movabs syntax to take the address via [imm64]. This was confusing for me since most (all?) instruction use the [] syntax when an address is involved, right? Also, other assemblers/disassemblers use the same syntax with the []. This will inevitably lead to confusion imho.

CensoredUsername commented 1 week ago

I understand that this would be clearer. Unfortunately with how the x64 backend works right now this would entail a pretty big rewrite of the parser/compiler. It is built towards memory references only being of the regular kind, which have a specific instruction structure, while 64-bit displacement mov is actually encoded as a single register + 64-bit immediate operand.

mlgiraud commented 1 week ago

Yeah i suspected as much, maybe we can add a small disclaimer in the documentation for x86 in the movabs part?