Assembler incorrectly honouring the DI subfield

alessandropellegrini commented 1 year ago

Describe the bug The assembler is assuming that full addressing is being used as soon as a memory operand is used. This causes inconsistent binary format of instructions using a short immediate and no displacement, as the immediate is always promoted to a quadword.

To Reproduce An instruction such as movb $1, (%rax) is assembled into a long instruction with a 64-bit immediate. The displacement field is set to zero.

Expected behaviour The instruction should be 64-bit long, with D=0, I=1 and the constant should be saved into the short immediate field.

lucagasperini commented 1 year ago

.org 0x800
.text
main:
    movb $1, (%rax)
    hlt

Seems that parser output of movb $1, (%rax) is correcly giving Line 4: token: movb op1 size: 1 op2.size: 1 size: 1 as I added a println to debug at INSN_B_E token

System.out.println("Line " + t.beginLine + ": token: " + t.image + " op1 size: " + op1.getSize() +  " op2.size: " + op2.getSize() + " size: " + size);

So it's output is a class 1 instruction:

if(mnemonic.equals("mov"))
    insn = new InstructionClass1(mnemonic, op1, op2, -1);

Anyway adding at addInstruction function

System.out.println("size: " + insn.getSize() + " mnemonic: " + insn.getMnemonic() + " " + insn.toString());

It's output is incorrect: size: 16 mnemonic: mov movb $1, (%rax)

lucagasperini commented 1 year ago

Proposed solution #14

alessandropellegrini commented 1 year ago

Hi, thanks for investigating this. It is a known bug I was aware of when I released the first version, but I ignored it because it does not hamper the simulator's functionality. I opened the issue to be sure that I won't forget.

Unfortunately, the proposed patch only partially solves this, which is a bit more tricky. Indeed, the patch can discriminate between:

movb $1, (%rax)

and

movq $1, (%rax)

But for example, it cannot identify constant promotion to 64-bit, as in

movb $1, 0xabcd(%rax)

Here, the short immediate/displacement field is already taken by the displacement, so the immediate is automatically promoted to a 64-bit constant and stored in the long immediate field.

The issue is a bit more tricky because one could write:

movb $1, label(%rax)
label:
...

Which is referencing a label defined later. In this case, at the time of parsing, the grammar has no clue whether label will be defined later or not. If it is defined, the program is correct. If not, there is an error. So the assembler (as all off-the-shelf compiles do) creates a relocation entry, storing that a tentative label should be considered later. At this point, the assembler stores 0 as the displacement because it knows nothing about the remainder of the program.

At the end of the assembly process, relocation entries are handled. If label was later defined, the displacement is updated accordingly. If not, an exception is thrown.

Yet, the instruction size is decided upon when the instruction is assembled. Here we don't know whether it will be a 64- or 128-bit instruction, so the simulator is conservatively assuming it will be a 128-bit instruction.

It should be possible to fix instruction size when relocations are handled, but this might require fixing the unfortunate choice to use -1 as d.c.c. rather than nulls for memory operands.

alessandropellegrini / z64sim

Assembler incorrectly honouring the DI subfield #13