MicroCoreLabs / Projects

Ted Fried's MicroCore Labs Projects which include microsequencer-based FPGA cores and emulators for the 8088, 8086, 8051, 6502, 68000, Z80, Risc-V, and also Typewriter and EPROM Emulator projects. MCL51, MCL64, MCL65, MCL65+, MCL68, MCL86, MCL86+, MCL86jr, MCLR5, MCLZ8
371 stars 78 forks source link

MCLZ8 - Some opcodes should take 5 cycles (for instance, DJNZ) and not only 4 cycles in M-cycle M1 #11

Open hlide opened 1 year ago

hlide commented 1 year ago

https://github.com/MicroCoreLabs/Projects/blob/ea66a81165a3f88b8d14fafd05a566c4d131c5ae/MCLZ8/Code/Standard_Z80/MCLZ8.ino#L847

The memory access to the byte for displacement jump (or not) is taken one T-state too early compared to a genuine Z80. For Z80 machines which are very sensitive to /WAIT pattern (e.g: AMSTRAD CPC), it may alter the whole timing and so the behavior of the CPC program counting very much about the cycle accuracy regarding the memory access.

That extra T-state at the end of M1 M-cycle is due to an internal "dec b". In mode 0, the MEMORY READ M-cycle for memory access to displacement byte should happen 5 CLK cycles after M1 M-cycle, not the usual 4 CLK cycles.

See DJNZ: https://www.manualsdir.com/manuals/753749/zilog-z08470.html?page=287

It is not the only instruction to have extra T-states inserted in the middle of the whole instruction T-states.

hlide commented 1 year ago

Here a list of instructions having potential inaccurate T-state points when accessing memory or I/O port :

Instruction M Cycles T States LD r,(IX/IY+d) 5 19 (4, 4, 3, **5**, 3) LD (IX/IY+d),r 5 19 (4, 4, 3, **5**, 3) PUSH qq 3 11 (**5**, 3, 3) PUSH IX/IY 4 15 (4, **5**, 3, 3) EX (SP), HL 5 19 (4, 3, **4**, 3, 5) EX (SP), IX/IY 6 23 (4, 4, 3, **4**, 3, 5) ADD A, (IX/IY + d) 5 19 (4, 4, 3, **5**, 3) ADC A, (IX/IY + d) 5 19 (4, 4, 3, **5**, 3) SUB A, (IX/IY + d) 5 19 (4, 4, 3, **5**, 3) SBC A, (IX/IY + d) 5 19 (4, 4, 3, **5**, 3) AND (IX/IY+d) 5 19 (4, 4, 3, **5**, 3) OR (IX/IY+d) 5 19 (4, 4, 3, **5**, 3) XOR (IX/IY+d) 5 19 (4, 4, 3, **5**, 3) CP (IX/IY+d) 5 19 (4, 4, 3, **5**, 3) INC (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) DEC (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) RLC (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) RL (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) RRC (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) RR (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) SLA (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) SRA (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) SLL/SL1 (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) SRL (IX+d) 6 23 (4, 4, 3, **5, 4**, 3) RLD 5 18 (4, 4, 3, **4**, 3) RRD 5 18 (4, 4, 3, **4**, 3) BIT b, (IX/IY+d) 5 20 (4, 4, 3, **5**, 4) SET b, (HL) 5 15 (4, 4, **4**, 3) SET b, (IX/IY+d) 5 23 (4, 4, 3, **5, 4**, 3) RES b, (HL) 5 15 (4, 4, **4**, 3) RES b, (IX/IY+d) 5 23 (4, 4, 3, **5, 4**, 3) DJNZ, e 3/2 13 (**5**,3, 5)/8 (**5**, 3) CALL nn 5 17 (4, 3, **4**, 3, 3) CALL cc, nn 5/3 17 (4, 3, **4**, 3, 3)/10 (4, 3, 3) RET cc 3/1 11 (**5**, 3, 3)/ 5 RST p 3 11 (**5**, 3, 3) INI/IND 4 16 (4, **5**, 3, 4) INIR/INDR 5/4 21 (4, **5**, 3, **4**, 5)/16 (4, **5**, 3, 4) OUTI/OUTD 4 16 (4, **5**, 3, 4) OTIR/OTDR 5/4 21 (4, **5**, 3, **4**, 5)/16 (4, **5**, 3, 4)

T-states are the problematic ones that should not be shifted in the end of instruction in mode 0 and 1 (and maybe 2) to be fully accurate with genuine Z80 timings.

hlide commented 1 year ago

The situation is even worse.

MCLZ80 is not accurate in T-states even in mode 0 !

1) ED opcodes timing are all wrong. For instance, IM 1 is executed in 11 T-states (4,7) instead of 7 T-states (4,3) because of the way clock_counter is accumulated.

2) I didn't check for CB and DD/FD tables but there are chances they are wrong if handled in the same way as ED tables.

3) CALL/RET/JR/RST instructions have too many cycles. For instance, a CALL took 24 T-states when taken (4,3,4,3,3,7) instead of 17 T-states (4,3,4,3,3).

Basically, using total cycles in tables is not good. Better handling extra M-cycle in the opcode functions (opcode_0xXX) to get a more accurate and precise timing.

void opcode_0xED56() { /* (4) */ register_im=1; /* (4) */ Extra_Cycles< 3 >(); /* (4, 3) */ return; } // IM1

with : template < uint8_t cycles > inline void Extra_Cycles() { if (bus_mode < 3) { for (uint8_t i=cycles; i--;) wait_for_CLK_rising_edge(), wait_for_CLK_falling_edge(); } } Clock_counter and clock tables not being used any longer.

Also note that you must check /INT on the rising edge of the last T-state of the last M-cycle of the instruction, so that's why I call both wait_for_CLK_rising_edge() and wait_for_CLK_falling_edge() during the extra cycles.

Case of DJNZ n :

void opcode_0x10() { /* (4) */ register_b--; Extra_Cycles< 1 >(); /* (5) */ if (register_b != 0) { Jump_Taken8(); /* (5,3) */ Extra_Cycles< 5 >(); /* (5,3,5) */ } else { Jump_Not_Taken8(); /* (5,3) */ } return; }

And so on...

MicroCoreLabs commented 1 year ago

Thank you for spending so much of your time analyzing the MCLZ8. The goal of the project was not to create a cycle-perfect Z80 clone, although it potentially could be if there was a desire to make it so. The experiment was to create an opcode-correct emulator running on a fast microcontroller which could be a drop-in replacement for most Z80 motherboards. With this goal achieved I have moved on to other projects. Updating the MCLZ8 to be a cycle-perfect drop-in emulator could be an enjoyable project if someone wanted to pursue it.


From: hlide @.> Sent: Tuesday, February 28, 2023 5:35 AM To: MicroCoreLabs/Projects @.> Cc: Subscribed @.***> Subject: Re: [MicroCoreLabs/Projects] MCLZ8 - Some opcodes should take 5 cycles (for instance, DJNZ) and not only 4 cycles in M-cycle M1 (Issue #11)

The situation is even worse.

MCLZ80 is not accurate in T-states even in mode 0 !

  1. ED opcodes timing are all wrong. For instance, IM 1 is executed in 11 T-states (4,7) instead of 7 T-states (4,3) because of the way clock_counter is accumulated.

  2. I didn't check for CB and DD/FD tables but there are chances they are wrong if handled in the same way as ED tables.

  3. CALL/RET/JR/RST instructions have too many cycles. For instance, a CALL took 24 T-states when taken (4,3,4,3,3,7) instead of 17 T-states (4,3,4,3,3).

Basically, using total cycles in tables is not good. Better handling extra M-cycle in the opcode functions (opcode_0xXX) to get a more accurate and precise timing.

void opcode_0xED56() { / (4) / register_im=1; / (4) / Extra_Cycles< 3 >(); / (4, 3) / return; } // IM1

with : template < uint8_t cycles > // compile-time optimization inline void Extra_Cycles() { if (bus_mode < 3) { for (uint8_t i=cycles; --i;) wait_for_CLK_rising_edge(), wait_for_CLK_falling_edge(); } } Clock_counter and clock tables not being used any longer.

Also not, you must check /INT on the rising edge of the last T-state of the last M-cycle of the instruction, so that's why I call both wait_for_CLK_rising_edge() and wait_for_CLK_falling_edge() during the extra cycles.

Case of DJNZ n :

void opcode_0x10() { / (4) / register_b--; Extra_Cycles< 1 >(); / (5) / if (register_b != 0) { Jump_Taken8(); / (5,3) / Extra_Cycles< 5 >(); / (5,3,5) / } else { Jump_Not_Taken8(); / (5,3) / } return; }

And so on...

— Reply to this email directly, view it on GitHubhttps://github.com/MicroCoreLabs/Projects/issues/11#issuecomment-1448190666, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM4AVEKZDVMETFYLQCFJHT3WZX5KHANCNFSM6AAAAAAUNATDI4. You are receiving this because you are subscribed to this thread.Message ID: @.***>