Closed TomHarte closed 11 months ago
No, you're right.
The cause is an embarrassing typo in my disassembler:
AddressingMode::BpSiDisp16(disp) => format!("{}{}:[bp+si+{}]", ptr_prefix, segment2, disp),
AddressingMode::BpDiDisp16(disp) => format!("{}{}:[bp+si+{}]", ptr_prefix, segment2, disp),
You can see the second case should be 'bp+di'. Everything functions as it should; it's just a display issue.
I even considered using iced-x86 to normalize the disassembly, but it didn't seem to like extraneous segment prefixes and other things that are present in these tests, so I went with MartyPC's internal disassembler.
This is trivially fixed, since we don't have to regenerate the tests, we just use 'bytes' and overwrite 'name'; but unfortunately it probably means every test that uses a modrm will need to be replaced.
Maybe a good opportunity to add hashes to the test set now?
Agreed, it seems like a good opportunity, as probably a very large proportion of tests will be touched — and with the implicit part, i.e. that it seems like a good idea.
Minor addendum to this, taking the topic slightly more broadly as "where the disassemblies look a little odd":
All instances of repe idiv
seem to be named as idiv ...
i.e. they start with two spaces and omit any mention of the repe
. As a specific example, see idiv word ss:[bx+di]
which has the given bytes
sequence of 54, 243, 247, 57
with 243
being the relevant of the two prefixes.
Conversely repne idiv
s are named as repne idiv ...
with no exceptions that I found.
(and, no, I'm unclear on whether one would conventionally use repe
or rep
for idiv
given that I think the effect of those two prefixes on idiv
is just an 8086/8088 oddity?)
I'm not sure either. I am somewhat inclined to leave them off since their meaning is completely different when prefixing idiv, and it's all undocumented anyway.
I hope to get a PR to you soon, I've just been a bit busy. I've been laying the groundwork for something that might enable production of test suite V2, again from hardware, but not necessarily using an Arduino.
I've been writing a sigrok decoder for a bus sniffer interface I have put together, and it has reached an advanced state:
I purchased some adapters for EEPROMS for my 5150, and my thought is that I can burn specific "generator ROMS" that fill memory and then start executing random (or targeted) instructions, using timer interrupts as a 'test director' to keep jumping to random segments so we never get stuck for too long, and using the timer ISR itself to set up register state for each block of tests.
Flip the logic analyzer on, capture a bunch, then use the decoder to dump out each instruction as a test, co-executing each block on MartyPC to get the per-instruction register state. I don't see any roadblocks to this plan, other than the ROM, ISR and IVT area of course no longer being writable by the tests. I think we can get away without DRAM refresh since we will be executing instructions all over the place.
I've also changed my mind a bit about including i8288 outputs in the tests, working with the analyzer has shown me that they are really unnecessary, you can calculate ALE and read/write signals yourself. Dropping that stuff would make the test sets much smaller.
The advantage of this is that it gives us "instructions-in-flight" with live queue activity, and it's potentially much, much faster, but the biggest thing perhaps is that we might even be able to do the 8087, although I will need a new PCB with a pass-through socket.
The alternative to doing this is either moving my Arduino8088 to an ArguinoGIGA and clocking it much, much faster, which has the advantage of just pretty much dropping in to an existing solution, which is nice, or, just trusting that MartyPC is accurate enough to just generate tests directly (which is by far the most performant and convenient option... but that 8087 test suite is tempting)
Since we're going to be replacing all the tests, I've made several changes to MartyPC's disassembler and validated it against iced-x86 in NASM mode with certain filters to ignore iced's lack of support for the 8088's undocumented instructions and aliases. I let both MartyPC and iced disassemble the entire test suite and compared their output, adjusting MartyPC to match as much as possible, but I did retain some personal design choices.
Changes are as follows:
Deviations from iced:
Remaining issues/questions:
Quite a few instructions seem to give a human-readable index of
si
despite actually usingdi
.Given the definition of
name
as a user-readable disassembly of the instruction, I think this is a valid criticism (if I've diagnosed it correctly) but to be explicit: it's an issue with the contents ofname
only, which are just human-friendly after-the-fact fields, so should be a mere clerical fix (again: if I'm not adrift here).For example, the test
add byte ss:[bp+si+0C67h], al
from00.json.gz
:si
in thename
as per above;ss = 4978
,bp = 2861
,di = 47332
,si = 57572
, ;ss:[bp+si+0C67h]
it'd be143256
; if it were insteadss:[bp+di+0C67h]
it'd instead be133016
;133016
is accessed, and143256
isn't.Furthermore, the second entry in
bytes
is 131, i.e. 0x83, which gives anrm
field of0x3
, which isbp+di
.The full (hexadecimal)
bytes
sequence for that instruction is00 83 67 0c
which this online disassembler also believes isbp + di
rather than+ si
(though as an aside, I've found that disassembler not to be entirely reliable — e.g. it seems always to assume a selector ofds
regardless of the base).@dbalsom can you confirm or deny that I've got an actual issue here, no matter how negligible?