CPU instruction timing - Githubissues

dirkwhoffmann commented 4 years ago

Could you throw the adf onto the A500 MMSE and see what the correct timings are?

Here we go:

bchg2_A500+

bchg1_A500+

And the winner is: portable68000 / UAE

BCHG consumes 6 cycles for shift value $00 and 8 cycles for shift value $10.

Interestingly, UAE has slight timing issues w.r.t. interrupt triggering (blue section):

Seems like I need to implement a Musashi compatibility mode in Moira. Otherwise, I cannot continue to run my test cases 😕. The bug is kind of unfixable in Musashi, because Musashi reads the cycle counts from a static table and there is one table entry for each instruction and addressing mode.

@mithrendal: Which tool-chain did you use to write your test-case? 🤔

mithrendal commented 4 years ago

as toolchain I used the Amiga-Assembly extension of @prb28 in VisualStudioCode for Mac...

grafik

it has commands for creating an ADF of the workspace and it is also possible to debug the code step by step

dirkwhoffmann commented 4 years ago

as toolchain I used the Amiga-Assembly extension

I expected this, because you had told me about this great project a couple of month ago. I definitely need to look into it again! When I first looked at it, I didn't really understand how to use it. This is by far not the author's fault, I just didn't spend enough time on it. Overall, the VSC extension might be a more efficient way to create test cases than my current tool chain which is based on the (also great work) by alpine9000.

mithrendal commented 4 years ago

I am not sure which is suited better for building small test programs... But in your case you could even try to pick best of both of them. That is, use alpine9000 solid build process and use VisualStudioCode with the AmigaAssembly Extension as an source code editor only? That way you have the existing "rock solid proven" alpine9000 toolchain and an nice editor which gives you hints about m68k instructions, Custom-Registers, Amiga-OS-Library-Calls, while editing your assembler file. With which editor do you currently edit the assembler files ?

The latest findings andi, bchg and nbcd are simply amazing 😃... This shows that there is still plenty of capacity for compatibility improvements ... even without letting Moira practise the "intermediate bus access emulation" dance of high martial arts, it could already improve vAmiga by simply picking some low hanging fruits e.g. correct andis instruction timings, etc...

dirkwhoffmann commented 4 years ago

The latest findings andi, bchg and nbcd are simply amazing

I think running Moira (which is supposed to be functionally equivalent to portable68000) against Musashi for each and every opcode is the right thing to do.

Oh wait, my sandbox has triggered another code red... 😶

MISMATCH FOUND (opcode $5048 out of $FFFF):

Instruction: addq.w  #8, A0

    Musashi: PC: 1002 Elapsed cycles:  4
      Moira: PC: 1002 Elapsed cycles:  8

Definitely another smoking gun...

dirkwhoffmann commented 4 years ago

🤭 Another hit...

MISMATCH FOUND (opcode $803c out of $FFFF):

Instruction: or.b    #$0, D0

    Musashi: PC: 1004 Elapsed cycles: 10
      Moira: PC: 1004 Elapsed cycles:  8

I need to see what the real machine does... stay tuned...

dirkwhoffmann commented 4 years ago

I need to see what the real machine does...

Goooaaalllll! Moira scores another one. (Of course, the credit belongs to portable68000 🙄)

dirkwhoffmann commented 4 years ago

I feel a little sorry for Musashi, but I've got to turn him in another time... 😬

MISMATCH FOUND (opcode $8080 out of $FFFF):

Instruction: or.l    D0, D0

    Musashi: PC: 1002 Elapsed cycles:  6
      Moira: PC: 1002 Elapsed cycles:  8

mithrendal commented 4 years ago

That is actually very good news for vAmiga 😃. For musashi the reporting of these timing issues are very valuable too. Don’t mind to continue to act like a pedantic bug reporter 😁😁😁for musashi. Any emulation project who use musashi will be grateful of the issued bug reports.

dirkwhoffmann commented 4 years ago

🥵 Enough for today...

The next one will require some decent testing. Musashi is definitely wrong here, because the consumed number of cycles depends on the operands. Though, it makes sense to set up some decent test cases to verify portable68000...

MISMATCH FOUND (opcode $80d0 out of $FFFF):

Instruction: divu.w  (A0), D0

    Musashi: PC:    0 Elapsed cycles: 38
      Moira: PC:    0 Elapsed cycles: 42

dirkwhoffmann commented 4 years ago

Any emulation project who use musashi will be grateful of the issued bug reports.

Maybe I should have waited until Christmas is over 🙄.

dirkwhoffmann commented 4 years ago

Any emulation project who use musashi will be grateful of the issued bug reports.

BTW, it would be cool to get an overview of the emulators using Musashi.

I feel that Moira has some potential and that it might make sense to maintain it as a stand-alone project in future (not just as the future vAmiga CPU). However, this would require the implementation of some CPU features that are not used by the Amiga. Obviously those features cannot be tested within vAmiga then. E.g., the 68000 supports vectored and auto-vectored interrupts, but only one of those is being used by the Amiga. To get around this, Moira could be tested, e.g., inside a Sega Genesis emulator. If such an emulator uses Musashi, it wouldn't be too difficult to plug Moira in, because Musashi's API can be mapped quite easily to Moira's. Did anybody every try such a Sega emulator?

mithrendal commented 4 years ago

The testcase setup is superb. Do you think it is possible to visualize the position and cycle length of the intermediate bus access of the instructions too? Maybe it is possible to see them when letting the blitter block the CPU and comparing the drawings of your test programs when running with bltprio bit set against bltprio bit cleared or when your test program are executed in fastram.

I am only thinking loud... Maybe it is nonsense but when the cpu instruction has started, the copper can stop the cpu at one cycle later and release it one dma cycle after that via the bltprio flag... when the graph is not any longer as without that action then that means the bus access is some cycles later...

I fear for the bus access times that method is likely not fine granular enough or we could encounter the Heisenberg's uncertainty principle... 😱

I mean that, somehow it must be possible to visualize the position of buscycles e.g. when the CPU misses a buscycle because otherwise it would not be relevant at all.

dirkwhoffmann commented 4 years ago

I fear for the bus access times that method is likely not fine granular enough

I think it will be difficult. We could certainly cause some delays by blocking the bus via the Blitter or bitplane DMA , but it's hardly doable in a fine-controlled way.

the Heisenberg's uncertainty principle

919c8f23ed1e23f1299937f3fe6c8578

dirkwhoffmann commented 4 years ago

Just wrote some DIVU, DIVS tests...

This is test divu1 on a A500+ 🥰:

divu1_A500+

As expected, all stripes have same size in vAmiga, because Musashi is using the same cycle-count for all operands:

The good news is that UAE does is right, so we shamelessly steal the code from there. The only problem will be to find the right place in the source code (as usual 🥴).

dirkwhoffmann commented 4 years ago

Moira's got a logo now 😎. Because the Moirai are a group of three cool girls with superpowers (as far as I understand it), I've chosen something with three different colours:

moiraLogo

In the meantime, I've also managed to rule out all cycle discrepancies with Musashi 🥳. OK, not completely. In my tests, I have to skip ABCD, SBCD, and NBCD, because they are broken in Musashi. Furthermore, I have to run Moira in "Musashi compatibility mode" that uses wrong cycle counts for some instructions (such as MUL or DIV).

For the next step, I plan to refine my testing strategy as follows:

Forever...
- For each opcode ...
  - Setup memory and registers with random numbers
  - Run Musashi agains Moira and compare the result
  - Repeat with the next opcode

With "comparing the result", I mean to

compare the program counter
compare the clocks (cycle counts)
compare the status register
compare the data and address registers

Is there anything else I could compare? 🤔

mithrendal commented 4 years ago

🙋🏻‍♀️🙋🏻‍♀️🙋🏻‍♀️Compare value in the memory destination address, in case the operands addressing mode has to write back into the memory.

Example: Add.l #2020,(a1)

To reduce complexity of the forever test loop code it is probably much easier to give the CPU only a very small amount of memory e.g. 4kb and compare the complete memory after each opcode regardless of the addressing mode.

mithrendal commented 4 years ago

I've chosen something with three different colours.

Nice the two elements wind (light blue), water (dark blue)and on top of that the fire 🔥 element. 🧚🏻‍♀️🧚🏿‍♀️🧚🏽‍♀️

dirkwhoffmann commented 4 years ago

Nice the two elements wind (light blue), water (dark blue)and on top of that the fire

Yes, but I think the ladies haven't been responsible for elements. On this picture, it looks like they were more in the cotton business 🤔:

Besides, the hot lady in the middle looks a little mean to me. No?

OMG, all of a sudden, the disassembler is broken... 😖

DISASSEMBLER MISMATCH FOUND:

    Musashi: move    #$0, CCR
      Moira: move    #$8000, CCR
      Bytes: 4 / 4

dirkwhoffmann commented 4 years ago

Compare value in the memory destination address, in case the operands addressing mode has to write back into the memory.

This is the purpose of my memory sandbox. It intercepts all memory accesses and records them in a list. The list is created when Musashi runs and compared to Moira when Moira runs.

The only issue is that I won't be able to utilise sandbox testing in vAmiga which was my original plan. Especially when it comes to prefetching, Musashi behaves very differently than portable68000 (and thus Moira). Hence, bugs need to be ruled out before Moira is plugged into vAmiga.

dirkwhoffmann commented 4 years ago

Today's a day with plenty of good news.

After ruling out the latest discrepancies between Moira and Musashi, my instruction testing framework doesn't complain any more 😎 (OK, it does if the instructions are executed in user mode, but it'll be easy to rule out those remaining bugs):

In each testing round, all 65536 opcodes are executed in both Musashi and Moira and the outcome is compared.

Moira CPU tester. (C) Dirk W. Hoffmann, 2019

The test program runs Moira agains Musashi with randomly generated data.
It runs until a bug has been found.

Round 1 ................ PASSED (Musashi: 0.03s Moira: 0.03s)
Round 2 ................ PASSED (Musashi: 0.06s Moira: 0.07s)
Round 3 ................ PASSED (Musashi: 0.09s Moira: 0.10s)
Round 4 ................ PASSED (Musashi: 0.12s Moira: 0.13s)
Round 5 ................ PASSED (Musashi: 0.15s Moira: 0.16s)
Round 6 ................ PASSED (Musashi: 0.18s Moira: 0.20s)
Round 7 ................ PASSED (Musashi: 0.21s Moira: 0.23s)
Round 8 ................ PASSED (Musashi: 0.24s Moira: 0.26s)
Round 9 ................ PASSED (Musashi: 0.26s Moira: 0.29s)
Round 10 ................ PASSED (Musashi: 0.29s Moira: 0.32s)
Round 11 ................ PASSED (Musashi: 0.32s Moira: 0.35s)
Round 12 ................ PASSED (Musashi: 0.35s Moira: 0.38s)
Round 13 ................ PASSED (Musashi: 0.38s Moira: 0.42s)
Round 14 ................ PASSED (Musashi: 0.40s Moira: 0.45s)
Round 15 ................ PASSED (Musashi: 0.43s Moira: 0.48s)
Round 16 ................ PASSED (Musashi: 0.46s Moira: 0.51s)
Round 17 ................ PASSED (Musashi: 0.49s Moira: 0.54s)

I've also added some rudimentary profiling code. It shows that we can expect Moira to be only about 10% slower than Musashi. This is very good news, too 🥳.

But the headline news is the existence of the UAE cputester Toni was working on over the last few month. The cputester is really a game changer, because it takes away my biggest fear: Having some ugly hard-to-find bug left in Moira.

Here is the roadmap for next year:

Cleanup Moira's code base
Add interrupt support (yes, Moira cannot trigger interrupts yet 🤭)
Integrate Moira into vAmiga
Bake a bunch of virtual ADFs with cputester and run it in vAmiga

Once this is done, vAmiga will have a stable CPU with full control over bus-timing 😎.

dirkwhoffmann / vAmiga

CPU instruction timing #261