memory operands decode error

gdabah / distorm

Powerful Disassembler Library For x86/AMD64

Other

1.26k stars 238 forks source link

memory operands decode error #125

Closed wonderzdh closed 6 years ago

wonderzdh commented 6 years ago

64bit binary: 48 8b 05 7f 10 1b 00

distorm operands 'index' is rip. but the correct one should be 'base' not 'index'.

gdabah commented 6 years ago

Hi @wonderzdh

I was taking a look at the code, and I think the decision was originally to do it as a simple memory dereference and not using a base in this case (to keep things more straight forward on purpose). Can you share your opinion on why it should be changed? I'm afraid to break compatibility with existing users. However, a possibility for a fix, in case it's an important one, is to add RIP as a base too in addition to existing code, but that might create some confusion for users... so I'm not sure. WDYT?

wonderzdh commented 6 years ago

Hi @gdabah

I think all access to memory should be this format: [base + index*scale + dislacement]

The existing code is probably not in accordance with this specification.

gdabah commented 6 years ago

@wonderzdh Personally, I think it's easier to scan the operands array in order to know which registers a specific operand holds. And if you don't want to scan them all to find the RIP, there's a flag that says that the RIP is used (FLAG_RIP_RELATIVE). The reason base is in the parent struct and not the operands is because we always have at most 1 base register, so I tried to saved memory this way. But then you will still need to scan the operands for MEM type to know which of them uses the base register.

Having said all this, that's why there's SMEM and MEM and DISP, to make things easier for developers. That's why I would like to really understand your use case at hand and see if we can make it better.

wonderzdh commented 6 years ago

It would be a lot easier to uniquely define the format of a memory operand. So you don't have to think about using index or base. When I used distorm, I was surprised to find that sometimes memory access uses index to save registers, sometimes using base. This makes me confused.

I think the following format can accurately describe all memory access formats.

[base + index*scale + dislacement]

gdabah commented 6 years ago

You're right. But what I had in mind is to spare memory space in the structure of how operand is defined, and therefore there's only one BASE register, which is also true to the ISA documentation. Currently I don't see a compelling reason to change this after all the years without breaking functionality. I'm sorry and thanks for your input/feedback!