Closed dirkwhoffmann closed 2 years ago
For one-off stuff I find Asm-pro to be very good, but that won't integrate with any kind of automatic testing. Might be an idea to ask on EAB (mentioning the usecase) since I regularly see the authors/maintainers of prominent disassemblers posting.
Automatic verification of Line-F instructions has turned out to be a dead end, so I've decided to crawl through this area manually. I've started with the FPU range and here is the first inconsistency:
5: F200 1185: Musashi: [4] FPU (?).x FP4, FP3
vda68k: [8] fbf.l 0x11851002
vd68k treats F200 1185
as a FBcc
instruction which seems wrong to me. According to what I see in the docs, bit 7 in the first word has to be set for FBcc
:
EDIT: Here is an example of such a difference:
DISASSEMBLER MISMATCH FOUND Instruction: [10] 0trapeq #$10101; (extension = $1) (2-3) (Musashi) [10] 0trapeq #$10101; (extension = $1) (2-3) (Moira) [8] ptrapbc.l #0x10001 (Vda68k, Motorola) [8] ptrapbc.l #0x10001 (Moira) [8] ptrapbc.l #0x10001 (Vda68k, MIT) [8] ptrapbc.l #0x10001 (Moira) Setup: PC: 1000 Opcode: f07b Ext1: 0001 Ext2: 0001 Ext3: 0001 (SUPERVISOR MODE) CCR: f5 VBR: 01 SFC: 01 DFC: 01 CACR: 01 CAAR: 01
Musashi and Vda68k differ in both the operand ($10101 vs. $10001) and the instruction size (10 bytes vs. 8 bytes).
m68k-amiga-elf-objdump.exe
says: f07b 0001 0001 ptrapbc.l #65537
5: F200 1185: Musashi: [4] FPU (?).x FP4, FP3
vda68k: [8] fbf.l 0x11851002
m68k-amiga-elf-objdump.exe
says: f200 .short 0xf200
Asm-pro also doesn't recognize $f200, $1185 (decodes it as LINE_F). Attempting to execute it on my 060 triggers an unhandled line f exception (the "expected" unhandled FPU instructions like "fsincos" are handled in software).
Originally, I thought the disassembler will be the easy part ๐ . That's because I came from the 6502 for which writing a disassembler is a no-brainer.
This one is also interesting:
F200 5c1a: Moira: [4] fneg.? D0, FP0
Musashi: [4] fmovecr #$1a, fp0
vda68k: [4] fmovecr.x #0x1a,fp0
It's a FNEG
instruction with an invalid source specifier (valid values are between 0 and 6).
On the M6888x, it's mapped to the FMOVEC
instruction which does not exist on the 68040.
Now, the question is if F200 5c1a
causes an illegal instruction exception on the 68040+. In this case, we should disassemble it to ILLEGAL
.
Originally, I thought the disassembler will be the easy part ๐ . That's because I came from the 6502 for which writing a disassembler is a no-brainer.
Still child's play compared to the horrors of doing a 16/32/64-bit x86 disassembler. Might be something for your next project ;)
From the EAB thread it looks like Bartman's suggestion of objdump or disassembler.library (might work with vamos) are the best suggestions, but dunno if that helps.
It's a
FNEG
instruction with an invalid source specifier (valid values are between 0 and 6). Now, the question is ifF200 5c1a
causes an illegal instruction exception on the 68040+. In this case, we should disassemble it toILLEGAL
.
On my 060 after
fmove.l #....,fp0
dc.w $f200, $5c1a
fmove.s fp0,d0
fp0 (d0) is set to zero (other fp registers unaffected)
Edit: Asm-pro also decodes it as FMOVECR.X #$1A,FP0
and it's probably handled like that by 040/060.library.
Might be something for your next project ;)
Never ever. ๐
From the EAB thread it looks like Bartman's suggestion of objdump or disassembler.library (might work with are the best suggestions
Oups, I missed most of today's replies (I didn't get notified need to adjust my EAB settings). Yes, this sounds promising and I'll definitely look into it.
and it's probably handled like that by 040/060.library
OK, I start to understand. My plan was to skip all 6888x-only instructions because I want to restrict myself to the 68040. Now, I understand that this doesn't make sense. The 68040 achieves compatibility with the M6888x by implementing all missing instructions in software.
So I need to change my plans. The disassembler needs to support the full range of (M6888x) FPU instructions. My exec handlers can be kept simpler though. They can just trigger an exception when a 6888x-only instruction is discovered.
So I need to change my plans. The disassembler needs to support the full range of (M6888x) FPU instructions. My exec handlers can be kept simpler though. They can just trigger an exception when a 6888x-only instruction is discovered.
Yes, notice that in MC68000PRM FMOVECR
is listed as (MC6888X,M68040FPSP). "SP" being the sofware package that MC intended vendors to include. There's something similar for 060 which also handles e.g. 32x32 -> 64 multiplication that the 060 dropped HW support for. These don't seem to be so easy to come by these days, but I managed to track down the 060SP (I imagine the 040 one is very similar) and have included it here for your reference: MC68060SP.zip
@mithrendal has kindly wrapped @BartmanAbyss's TypeScript parser with a command line interface. After fixing minor issues with argument parsing, it works as expected:
hoff@MacBook-Pro Bartman % node dasm.js 0xF200 0x5c1a 0x333
match: {
name: 'fmovecrx',
size: 4,
opcode: 4026555392,
match: 4060085248,
args: 'Ii#CF7',
arch: 112,
type: 1
}
f2 00 5c 1a 03 33 00 00 00 00 00 00 00 00 00 00
$f200 $5c1a : fmovecrx #26,fp0
At first glance, accuracy of the binutils
disassembler seems to be pretty good. Thus, I think I should add a DASM_GNU
compatibility mode (similar to the existing DASM_MUSASHI
mode) that can be used for automated unit testing (the existing `DASM_VDA68K_xxx' modes can then be trashed). A drawback of this approach is that I need to call a shell tool from within my code for each instruction to disassemble.
Actually extracting the code (like Bartman did) and just keeping it in C wasn't too difficult:
Only very very lightly tested, but seems to work. Note I added a replacement function for floatformat_to_double
that may not be completely correct for extended precision numbers (only tested with one value).
Actually extracting the code (like Bartman did) and just keeping it in C wasn't too difficult
Wow, that's awesome news! ๐ I'll integrate it asap.
I've integrated the new disassembler in the TestRunner app and added a new style called DASM_GNU
(which is producing garbage results at the moment). The next step will be to match the Binutils output one by one with this style. Once this is done, I'll remove the deprecated styles DASM_VDA68K_MOT
and DASM_VDA68K_MIT
.
DISASSEMBLER MISMATCH FOUND
Instruction: [4] ori.b #$1, D0 (Musashi)
[4] ori.b #$1, D0 (Moira)
[4] ori.b #1,d0 (Binutils)
[4] ori.b #$1, (Moira)
[4] ori.b #0x1,d0 (Vda68k, Motorola)
[4] ori.b #0x1,d0 (Moira)
[4] ori.b #0x1,d0 (Vda68k, MIT)
[4] ori.b #0x1,d0 (Moira)
Noticed I accidentally left in a "debug printf" in the floatformat_to_double function (here)
BTW if you don't want to bother with the memstream stuff, it's easy to change the fprintf_func to something else: just cast the stream pointer, which BTW was a void* in the original source, to something else and/or the fprintf_ftype, and use sprintf and append to a string buffer. But what you did works as well of course.
Interesting tidbit: In contrast to the other disassemblers, binutils
mixes hexadecimal and decimal numbers.
Instruction: [6] ori.b #$8d, $2687.w (Musashi)
[6] ori.b #$8d, $2687.w (Moira)
[6] ori.b #-115,$2687 (Binutils)
[6] ori.b #-115,9863.w (Moira)
[6] ori.b #-0x73,0x2687.w (Vda68k, Motorola)
[6] ori.b #-0x73,0x2687.w (Moira)
[6] ori.b #-0x73,0x2687 (Vda68k, MIT)
[6] ori.b #-0x73,0x2687 (Moira)
That might be because of the print_address_func
that I hastily implemented. The real binutils probably uses decimal for addresses as well (checking on Bebbo's version of compiler explorer that seems to be the case).
I played around a little with compiler explorer. It seems like everything is printed in decimal.
That might be because of the print_address_func that I hastily implemented
The TypeScript version mixes decimal and hexadecimal, too:
hoff@MacBook-Pro Bartman % node dasm.js 0x0038 0x068d 0x342A
$0038 $068d $342a : ori.b #-115,$342a
@BartmanAbyss: Do you remember whether there was a specific reason for that?
Hmm.. I think I just ported the code as is, I can't find any more hints in my commits unfortunately.
I think I've found a bug in the Binutils disassembler (running in 68010 mode):
Instruction: [6] cmpi.b #-$73, ($2687,PC); (1+); ($368b) (Musashi)
[6] cmpi.b #-$73, ($2687,PC); (1+); ($368b) (Moira)
[2] .short 0x0c3a (Binutils)
[6] cmpi.b #-115,13963(pc); (1+) (Moira)
[6] cmpi.b #-0x73,0x368b(pc) (Vda68k, Motorola)
[6] cmpi.b #-0x73,0x368b(pc) (Moira)
[6] cmpi.b #-0x73,pc@(0x368b) (Vda68k, MIT)
[6] cmpi.b #-0x73,pc@(0x368b) (Moira)
It doesn't recognize the two PC addressing modes (the 68000 does not offer them, but the 68010 does).
I think these two line
{"cmpib", 4, one(0006000), one(0177700), "#b$s", m68000 | m68010, dis_nonbranch },
{"cmpib", 4, one(0006000), one(0177700), "#b@s", m68020up | cpu32 | fido_a, dis_nonbranch },
need to be replaced by those:
{"cmpib", 4, one(0006000), one(0177700), "#b$s", m68000, dis_nonbranch },
{"cmpib", 4, one(0006000), one(0177700), "#b@s", m68010up | cpu32 | fido_a, dis_nonbranch },
This one is even stranger (running in 68020 mode):
Instruction: [4] cmp2.b (A0), D0; (2+) (Musashi)
[4] cmp2.b (A0), D0; (2+) (Moira)
[2] .short 0x00d0 (Binutils)
[4] cmp2.b (a0),d0 (Moira)
[4] cmp2.b (a0),d0 (Vda68k, Motorola)
[4] cmp2.b (a0),d0 (Moira)
[4] cmp2.b a0@,d0 (Vda68k, MIT)
[4] cmp2.b a0@,d0 (Moira)
It doesn't recognize the cmp2
instruction although the mode identifier (!
in !sR1
) seems correct:
{"cmp2b", 4, two(0000300,0), two(0177700,07777), "!sR1", m68020up | cpu32 | fido_a, dis_nonbranch },
{"cmp2w", 4, two(0001300,0), two(0177700,07777), "!sR1", m68020up | cpu32 | fido_a, dis_nonbranch },
{"cmp2l", 4, two(0002300,0), two(0177700,07777), "!sR1", m68020up | cpu32 | fido_a, dis_nonbranch },
cmpi.b
looks like a bug allright. cmp2.b
I'm guessing the second word is illegal (contains data in 0x0fff [07777])
Yes, the second word is illegal, so it is doing the right thing. I just came to understand how to interprete two(0000300,0), two(0177700,07777)
.
Found one more:
{"chkw", 2, one(0040600), one(0170700), ";wDd", m68020up, dis_nonbranch }
must be
{"chkw", 2, one(0040600), one(0170700), ";wDd", m68000up, dis_nonbranch }
Had a chance to check on my a1200 and dc.w $00d0, $0001
is recognized by asmpro as cmp2.b (a0),d0
and it also behaves as such (also emulated by 060.library if I have my accelerator enabled).
Playing around with a bit, it seems like the 11 least significant bits of the second word are ignored (though asmpro doesn't get this right). The same goes for chk2.
There is some strangeness in the binutils code which is exposed by the TAS
command. It disassembles as follows:
[2] ta.s d0 (Binutils)
[2] tas d0 (Moira)
If line
{"tas", 2, one(0045300), one(0177700), "$s", m68000up | mcfisa_b | mcfisa_c, dis_nonbranch },
is changed to
{"tass", 2, one(0045300), one(0177700), "$s", m68000up | mcfisa_b | mcfisa_c, dis_nonbranch },
it looks like this:
[2] tas.s d0 (Binutils)
[2] tas d0 (Moira)
The last character of the instruction name is likely intepreted as a size identifier. As a workaround, I can append a b
to match the correct size attribute of TAS
:
[2] tas.b d0 (Binutils)
[2] tas d0 (Moira)
In addition, there is an alias table m68k_opcode_aliases
containing this entry:
{ "tasb", "tas", },
I guess this table is supposed to do a back translation from tas.b
to tas
. But it seems like the table is defined, but never used.
I think @bebbo modified the binutils to use motorola syntax. quite a lot of that is done by massaging the resulting strings. In my TypeScript version I have modified the opcode names in that table directly to use the motorola syntax.
I think these two line
{"cmpib", 4, one(0006000), one(0177700), "#b$s", m68000 | m68010, dis_nonbranch }, {"cmpib", 4, one(0006000), one(0177700), "#b@s", m68020up | cpu32 | fido_a, dis_nonbranch },
need to be replaced by those:
{"cmpib", 4, one(0006000), one(0177700), "#b$s", m68000, dis_nonbranch }, {"cmpib", 4, one(0006000), one(0177700), "#b@s", m68010up | cpu32 | fido_a, dis_nonbranch },
but wouldn't that also enable the 68020+ only modes for 68010? https://www.nxp.com/docs/en/reference-manual/M68000PRM.pdf pg. 4-80
MC68020, MC68030, and MC68040 only
(bd,An,Xn)** 110 reg. number:An (bd,PC,Xn)โ 111 011
([bd,An,Xn],od) 110 reg. number:An ([bd,PC,Xn],od) 111 011
([bd,An],Xn,od) 110 reg. number:An ([bd,PC],Xn,od) 111 011
I think @bebbo modified the binutils to use motorola syntax. quite a lot of that is done by massaging the resulting strings
OK, thanks, that makes sense. I think the modification takes place here:
#ifdef MOTOROLA
/* add a . into movel and simila names. */
int bnl = strlen(best->name);
char c = best->name[bnl - 1];
if (strcmp("rts", best->name)
&& strcmp("bfexts", best->name)
&& strcmp("bfins", best->name)
&& strcmp("cas", best->name)
&& (c == 's' || c == 'w' || c == 'b' || c == 'l'))
{
static char b[32];
strcpy(b, best->name);
b[bnl - 1] = '.';
b[bnl] = c;
b[bnl + 1] = 0;
info->fprintf_func (info->stream, "%s", b);
} else
#endif
For now, I simply take this code out. As a result, I should get MIT style instruction names for each instruction (e.g., movel
instead of move.l
). Of course, the output will be inconsistent with the operand syntax (which is Motorola), but that's OK for my purpose. I can use the resulting code to verify the instruction names for my personal MIT mode. Later, Moira will offer DASM_MOIRA_MOT
and DASM_MOIRA_MIT
as the two standard disassembler syntaxes. The compatibility modes are mainly intended for unit testing.
I think @bebbo modified the binutils to use motorola syntax. quite a lot of that is done by massaging the resulting strings. In my TypeScript version I have modified the opcode names in that table directly to use the motorola syntax.
the binutils are using the motorola syntax since ever:
This syntax for the Motorola 680x0 was developed at MIT.
The 680x0 version of 'as' uses instructions names and syntax
compatible with the Sun assembler. Intervening periods are ignored; for
example, 'movl' is equivalent to 'mov.l'.
And omitting the %
is configurable.
Then: ta.s
is the same for as as tas
, but using tasb
might be a noninvasive change and thus ok.
Refering to cmpi
and tst
:
There is no predefined pattern that matches 0,2-6,7.0-7.2 and (d8,PC,Xn)
is even 7.3.
I can live that the gnu assembler doesn't allow that for 68010.
And for disassembling I switched the default to 68040
since I want to read the insn and not .short
mnemonics.
last: unknown opcodes: I'm fine if these pop up as .short
as long as valid insns are assembled/disassembled correcty.
00d0 0000 cmp2.b (a0),d0
is working
just my 2c
add
&& strcmp("tas", best->name)
and tas
is no longer touched
First of all, thanks for all the good advice. This helped me to move forward at a good pace. In 68000 and 68010 mode, no more disassembler mismatches are reported (at least not in the first 600 rounds).
Now, I am working on getting the 68020 instruction set right. To me, this one appears to be a bug in binutils:
Instruction: [4] divu.l D0, D0; (2+) (Musashi)
[4] divu.l D0, D0; (2+) (Moira)
[4] divull d0,d0,d0 (Binutils)
[4] divul d0,d0 (Moira)
[4] divu.l d0,d0 (Vda68k, Motorola)
[4] divu.l d0,d0 (Moira)
[4] divu.l d0,d0 (Vda68k, MIT)
[4] divu.l d0,d0 (Moira)
Setup: PC: 1000 Opcode: 4c40 Ext1: 0000 Ext2: 582f Ext3: ffff
Binutils recognizes the instruction as a div with a 64-bit dividend, but the size bit is not set in Ext1
.
Interestingly, Bartman's port does it right:
hoff@MacBook-Pro Bartman % node dasm.js 0x4c40 0x0000
match: {
name: 'divu.l',
size: 4,
opcode: 1279262720,
match: 4290809848,
args: ';lDD',
arch: 60,
type: 1
}
$4c40 $0000 : divu.l d0,d0
haha, that's funny, cause I still have an open issue // TODO: args for divul, divsl
, but I guess that comment refers to divul.l
and divsl.l
I think binutils is right here: divu.l is the 64/32-bit version while divul.l (selected when size=0) is 32/32 returning both quotient and remainder (except in this case since Dr=Dq). Asmpro disassembles it as divul.l d0,d0:d0
thanks for reminding me, I just fixed my typescript version to match the original binutils. https://github.com/BartmanAbyss/vscode-amiga-debug/commit/a60d3b525d17ed43121cb8113b8230531ba25e98
I think binutils is right here
Agreed. For the special case where Dr
and Dq
match, Musashi and Vda68k screw the syntax up.
Also, if you want one more disassembler to add to the confusion, WinUAE has one as well: https://github.com/tonioni/WinUAE/blob/master/disasm.cpp
I've reached the area where the fun begins: Line-F space, 68030 instruction set.
Here is one I don't understand:
Instruction: [4] pflushr 0, 9, (A0) (Musashi)
[4] pflush sfc, #$1, (A0) (Moira)
[4] pflush sfc,#9,(a0) (Binutils)
[4] pflush sfc,#1,(a0) (Moira)
[4] pflush sfc,#0x9,(a0) (Vda68k, Motorola)
[4] pflush sfc,#0x1,(a0) (Moira)
[4] pflush sfc,#0x9,a0@ (Vda68k, MIT)
[4] pflush sfc,#0x1,a0@ (Moira)
Setup: PC: 1000 Opcode: f010 Ext1: 3920 Ext2: cb51 Ext3: 970a
If mask is a 3 bit value, how can the output be '9'?
The 68851 variant has 4 bits
The new DASM_GNU
mode seems to work. All disassembler tests pass:
Moira CPU tester. (C) Dirk W. Hoffmann, 2019 - 2022
The test program runs Moira agains Musashi with randomly generated data.
Test rounds: 1
Random seed: 367
Exec range: (opcode >= 0x0000 && opcode <= 0xEFFF)
Dasm range: (opcode >= 0x0000 && opcode <= 0xFFFF)
Round 1:
68000 CPU ................................ PASSED (Moira: 0.60s Musashi: 0.77s)
68010 CPU ................................ PASSED (Moira: 1.21s Musashi: 1.55s)
EC020 CPU ................................ PASSED (Moira: 1.83s Musashi: 2.34s)
68020 CPU ................................ PASSED (Moira: 2.46s Musashi: 3.14s)
EC030 CPU ................................ PASSED (Moira: 3.08s Musashi: 3.94s)
68030 CPU ................................ PASSED (Moira: 3.70s Musashi: 4.73s)
68030 MMU ................................ PASSED (Moira: 3.70s Musashi: 4.73s)
EC040 CPU ................................ PASSED (Moira: 4.32s Musashi: 5.52s)
LC040 CPU ................................ PASSED (Moira: 4.94s Musashi: 6.31s)
LC040 MMU ................................ PASSED (Moira: 4.94s Musashi: 6.31s)
68040 CPU ................................ PASSED (Moira: 5.56s Musashi: 7.11s)
68040 MMU ................................ PASSED (Moira: 5.56s Musashi: 7.11s)
68040 FPU ................................ PASSED (Moira: 5.56s Musashi: 7.11s)
All tests completed
At the end, it was much more difficult than I had originally anticipated. In addition, quality has degraded a lot which means that I need to spend some time on refactoring before working on any new functionality.
Done. The new disassembler is now part of vAmiga:
At the moment, all five syntax styles are available (DASM_MOIRA_MOT
, DASM_MOIRA_MIT
, DASM_GNU
, DASM_GNU_MIT
, DASM_MUSASHI
).
The first two are vAmiga's native styles where everything is displayed as I personally like it most. The other three styles are compatibility styles that are mostly intended for unit testing and debugging. There are some limitations though: The Musashi style displays all instructions in the $Fxxx area as illegal instructions as it does not handle all instructions in this range correctly. Moreover, floating point constants are not shown, yet. They are displayed as <fixme>
at the moment.
I've started to add disassembler support for MMU instructions.
A little background: Moiras disassembler is verified by the testrunner app. For each opcode, testrunner disassembles the instruction with Moira, Musashi, and vdam68k. After that, it matches the outputs. For some opcodes, I've observed that the output of Musashi snd vdam68k are not only syntactically different (which is OK), but also semantically.
What I'd like to have (but may not exist) is a really reliable disassembler that can be treated as a golden reference where I can compare to. Originally, I thought vdam68k could serve this purpose, but it can't (at least not in the LINE-F area).
If somebody has suggestions for such a golden reference disassembler, please let me know.
P.S.: I've also looked at the Online Disassembler (which is really nice), but it doesn't seem to be perfect either. E.g., it doesn't know of any MMU command at all.
EDIT: Here is an example of such a difference:
Musashi and Vda68k differ in both the operand ($10101 vs. $10001) and the instruction size (10 bytes vs. 8 bytes).