8088: some `name`s refer to `si` but should refer to `di`

SingleStepTests / ProcessorTests

A language-agnostic JSON-encoded instruction-by-instruction test suite for the 8088, 68000, 65816, 65[c]02 and SPC700 that includes bus activity.

187 stars 13 forks source link

8088: some `name`s refer to `si` but should refer to `di` #64

Closed TomHarte closed 11 months ago

TomHarte commented 1 year ago

Quite a few instructions seem to give a human-readable index of si despite actually using di.

Given the definition of name as a user-readable disassembly of the instruction, I think this is a valid criticism (if I've diagnosed it correctly) but to be explicit: it's an issue with the contents of name only, which are just human-friendly after-the-fact fields, so should be a mere clerical fix (again: if I'm not adrift here).

For example, the test add byte ss:[bp+si+0C67h], al from 00.json.gz:

has si in the name as per above;
begins with ss = 4978, bp = 2861, di = 47332, si = 57572, ;
therefore if the address in use were ss:[bp+si+0C67h] it'd be 143256; if it were instead ss:[bp+di+0C67h] it'd instead be 133016;
per both the bus activity and the listed RAM contents, 133016 is accessed, and 143256 isn't.

Furthermore, the second entry in bytes is 131, i.e. 0x83, which gives an rm field of 0x3, which is bp+di.

The full (hexadecimal) bytes sequence for that instruction is 00 83 67 0c which this online disassembler also believes is bp + di rather than + si (though as an aside, I've found that disassembler not to be entirely reliable — e.g. it seems always to assume a selector of ds regardless of the base).

@dbalsom can you confirm or deny that I've got an actual issue here, no matter how negligible?

dbalsom commented 1 year ago

No, you're right.

The cause is an embarrassing typo in my disassembler:

                AddressingMode::BpSiDisp16(disp) => format!("{}{}:[bp+si+{}]", ptr_prefix, segment2, disp),
                AddressingMode::BpDiDisp16(disp) => format!("{}{}:[bp+si+{}]", ptr_prefix, segment2, disp),

You can see the second case should be 'bp+di'. Everything functions as it should; it's just a display issue.

I even considered using iced-x86 to normalize the disassembly, but it didn't seem to like extraneous segment prefixes and other things that are present in these tests, so I went with MartyPC's internal disassembler.

This is trivially fixed, since we don't have to regenerate the tests, we just use 'bytes' and overwrite 'name'; but unfortunately it probably means every test that uses a modrm will need to be replaced.

dbalsom commented 1 year ago

Maybe a good opportunity to add hashes to the test set now?

TomHarte commented 1 year ago

Agreed, it seems like a good opportunity, as probably a very large proportion of tests will be touched — and with the implicit part, i.e. that it seems like a good idea.

TomHarte commented 12 months ago

Minor addendum to this, taking the topic slightly more broadly as "where the disassemblies look a little odd":

All instances of repe idiv seem to be named as idiv ... i.e. they start with two spaces and omit any mention of the repe. As a specific example, see idiv word ss:[bx+di] which has the given bytes sequence of 54, 243, 247, 57 with 243 being the relevant of the two prefixes.

Conversely repne idivs are named as repne idiv ... with no exceptions that I found.

(and, no, I'm unclear on whether one would conventionally use repe or rep for idiv given that I think the effect of those two prefixes on idiv is just an 8086/8088 oddity?)

dbalsom commented 12 months ago

I'm not sure either. I am somewhat inclined to leave them off since their meaning is completely different when prefixing idiv, and it's all undocumented anyway.

I hope to get a PR to you soon, I've just been a bit busy. I've been laying the groundwork for something that might enable production of test suite V2, again from hardware, but not necessarily using an Arduino.

I've been writing a sigrok decoder for a bus sniffer interface I have put together, and it has reached an advanced state:

I purchased some adapters for EEPROMS for my 5150, and my thought is that I can burn specific "generator ROMS" that fill memory and then start executing random (or targeted) instructions, using timer interrupts as a 'test director' to keep jumping to random segments so we never get stuck for too long, and using the timer ISR itself to set up register state for each block of tests.

Flip the logic analyzer on, capture a bunch, then use the decoder to dump out each instruction as a test, co-executing each block on MartyPC to get the per-instruction register state. I don't see any roadblocks to this plan, other than the ROM, ISR and IVT area of course no longer being writable by the tests. I think we can get away without DRAM refresh since we will be executing instructions all over the place.

I've also changed my mind a bit about including i8288 outputs in the tests, working with the analyzer has shown me that they are really unnecessary, you can calculate ALE and read/write signals yourself. Dropping that stuff would make the test sets much smaller.

The advantage of this is that it gives us "instructions-in-flight" with live queue activity, and it's potentially much, much faster, but the biggest thing perhaps is that we might even be able to do the 8087, although I will need a new PCB with a pass-through socket.

The alternative to doing this is either moving my Arduino8088 to an ArguinoGIGA and clocking it much, much faster, which has the advantage of just pretty much dropping in to an existing solution, which is nice, or, just trusting that MartyPC is accurate enough to just generate tests directly (which is by far the most performant and convenient option... but that 8087 test suite is tempting)

dbalsom commented 11 months ago

Since we're going to be replacing all the tests, I've made several changes to MartyPC's disassembler and validated it against iced-x86 in NASM mode with certain filters to ignore iced's lack of support for the 8088's undocumented instructions and aliases. I let both MartyPC and iced disassemble the entire test suite and compared their output, adjusting MartyPC to match as much as possible, but I did retain some personal design choices.

Changes are as follows:

BP+DI typo corrected
rep* prefixes removed from display on non-string operations
Spacing issues with 'rep' prefix corrected
Segment has been moved within the brackets to match NASM-style
Negative displacements are now displayed as a negative value as appropriate
Relative jumps are now displayed as a negative value as appropriate
Relative jumps now adjust displayed offset by the size of the instruction
A 'short' keyword is now emitted for near jumps
Immediate operands are now sign and zero extended, where appropriate

Deviations from iced:

Many relative jump mnemonics have two or three aliases. I have made personal style choices about which to use.
iced will emit 'repne' mnemonics to match F2 prefix, even if 'repne' does not apply to an opcode and would be interpreted same as 'rep'. I will continue to emit the latter as I believe it is more correct.
iced prefers 'jmp' and 'call' to 'jmpf' and 'callf'; I prefer the latter as they are easier for me to see in listings
iced does not decode setmo/setmoc/salc
iced decodes additional opcodes that have been inserted into some 'illegal' 8088 opcode forms (xabort, xbegin, etc)
iced decodes FPU instructions from D8-DF. This is something I'd like to add to MartyPC; but it is not really relevant for this test suite.

Remaining issues/questions:

Marty does not show any indication in disassembly for segment overrides of string operations. I am not a fan of how iced does it either, and even if I was it would be difficult to copy since my instruction decoder does not consider string operations to have operands. I will need to decide something, I'm leaning toward something like 'rep ss movsb'
Marty (and iced, via a flag) always show the segment, even when not overridden. I find this useful, personally, as I can't always remember when an addressing mode is referencing ss or not; although I have noticed including a segment indicator when not required can cause extraneous segment override prefixes to be emitted by assemblers, which is less than ideal.

dbalsom commented 11 months ago

https://github.com/TomHarte/ProcessorTests/pull/69