Closed ibabushkin closed 8 years ago
@emoon, do you have any comments?
This is how the code in the C writer works
case M68K_AM_IMMIDIATE:
if (inst->op_size.type == M68K_SIZE_TYPE_FPU) {
if (inst->op_size.fpu_size == M68K_FPU_SIZE_SINGLE)
SStream_concat(O, "#%f", op->simm);
else if (inst->op_size.fpu_size == M68K_FPU_SIZE_DOUBLE)
SStream_concat(O, "#%f", op->dimm);
else
SStream_concat(O, "#<unsupported>");
break;
}
SStream_concat(O, "#$%x", op->imm);
break;
The problem is that on 68K you can have different types of immediate values of the same size but different data.
For example
fmove.s #1.0,fp0
Immediate here is a float (float)
fmove.l #1,fp0
Immediate here is 32-bit (integer)
This make the deal with immediate values quite more complicated and thus I store
// Type of size that is being used for the current instruction
typedef enum m68k_size_type {
M68K_SIZE_TYPE_INVALID = 0,
M68K_SIZE_TYPE_CPU,
M68K_SIZE_TYPE_FPU,
} m68k_size_type;
// Operation size of the current instruction (NOT the actually size of instruction)
typedef struct m68k_op_size {
m68k_size_type type;
union {
m68k_cpu_size cpu_size;
m68k_fpu_size fpu_size;
};
} m68k_op_size;
// The M68K instruction and it's operands
typedef struct cs_m68k {
// Number of operands of this instruction or 0 when instruction has no operand.
cs_m68k_op operands[M68K_OPERAND_COUNT]; // operands for this instruction.
m68k_op_size op_size; // size of data operand works on in bytes (.b, .w, .l, etc)
uint8_t op_count; // number of operands for the instruction
} cs_m68k;
So in order to deal with this I store a op_size
which can be either for the FPU and CPU. (because as written above LONG and FLOAT is the same size)
Relevant discussion also https://github.com/aquynh/capstone/pull/487#discussion_r41082933
But if we extend the m68k_op_type
union to reflect on this, we'd get a few benefits:
As for the orthogonality thing: In Haskell, I provide marshalling code for each structure in the public headers, which works independently for each instance I store or read from a location. I could just provide incomplete code to do this and make things context-sensitive, but this adds a few rough edges, as people expect the marshalling routines to work differently.
Based on that, I suggest I'll try to implement this proposal (without deleting actual code), if you guys are interested in this. I also think we should do this before the release to avoid incompatibilities with third-party software later on.
Followup question: I think I understand the problem with immediate values better now, but why are operands with a size of 12 bytes (M68K_FPU_SIZE_EXTENDED
) nowhere to be found? Also, why do we store two registers in one byte for register pairs? We have enough space in the union to put a small inline struct there without increasing overall struct size.
Based on that, I suggest I'll try to implement this proposal (without deleting actual code), if you guys are interested in this. I also think we should do this before the release to avoid incompatibilities with third-party software later on.
Sure. I don't mind at all as long as the code has the same behaviour.
Followup question: I think I understand the problem with immediate values better now, but why are operands with a size of 12 bytes (M68K_FPU_SIZE_EXTENDED) nowhere to be found?
I never implemented support for it. I have never seen any code actually use it in the wild (for immediate values) and it would require doing a conversion from 96 bits to 64 to actually print the value, while this might not be that hard to do it's currently un-implemented.
Also, why do we store two registers in one byte for register pairs? We have enough space in the union to put a small inline struct there without increasing overall struct size.
Sure that can be changed if wanted.
Alright, I'll send a pull request these days.
👍
Hm, this is weird: grepping for M68_OP_FP
didn't reveal anything. Did you mean by this:
I never implemented support for it.
that floating point isn't used at all so far? That's not what I understood from your original comment, that's why I'm asking.
No, FPU instructions are implemented https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c#L1991 but I guess M68K_OP_FP
isn't being set.
I was only referring to 96-bit floating point immediate values when saying I never added support for it.
Alright, that's what I assumed. I will investigate further on why the operand type constant isn't used.
Alright, that's what I assumed. I will investigate further on why the operand type constant isn't used
I would guess that everything works because the instruction printer doesn't actually care what kind of opcode it is.
As this code
https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c#L1928
Deals with the FPU stuff it will set the appropriate used FPn
register and the instruction printer just refers to a table where all the registers are present.
Hm, it seems as if the function you linked does set the type to M68K_OP_IMM
unconditionally.
I assume the situation would be fixed if I inserted an update to the field here:
https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c#L2076
and in the adjacent lines. I will do this now, so I think the fix will be complete that way.
I am currently working on updated Haskell bindings for the upcoming release, and binding to the new header
include/capstone/m68k.h
proved more difficult than assumed. The issue isn't just the fact that the architecture in question is CISC and thus more complicated than the RISC instruction sets we support, but also the fact that the new header is different in coding style, poorly documented and in some cases highly unorthogonal.Examples: For most architectures, we have a tagged union in the
cs_<arch>_op
struct, that holds information on an instruction's operands. The tags are elements of the<arch>_op_type
enum. If there is not one-to-one correspondence between enum values and union members, it's noted in the comments, otherwise the connection is trivially clear. For instance, on MIPS:Obviously, providing robust marshalling code for such a structure is easy. In this case, however, such a clear path can't be taken:
My assumption is that a lot of the missing data is context-sensitive and essentially stored in a different structure. This approach has two problematic edges for me:
Thus, I suggest that this issue gets adressed in the following way:
cs_m68k_op
struct (otherwise I can't provide Haskell code to deal with this, sadly) - I'd love to help here, but I'd need some pointers regarding the architecture, as the reference didn't provide quick insightAlso, in some cases, I'd use
bool
instead ofuint8_t
, but that's trivial to fix and not really an issue either.That's it for now, everything else hapstone-related works fine :)