TomHarte / CLK

A latency-hating emulator of: the Acorn Electron and Archimedes, Amstrad CPC, Apple II/II+/IIe and early Macintosh, Atari 2600 and ST, ColecoVision, Enterprise 64/128, Commodore Vic-20 and Amiga, MSX 1/2, Oric 1/Atmos, early PC compatibles, Sega Master System, Sinclair ZX80/81 and ZX Spectrum.
MIT License
926 stars 52 forks source link

Apple II floating bus does not switch modes at the right time #1204

Closed ryandesign closed 6 months ago

ryandesign commented 10 months ago

I noticed this problem as I begin to write a program to automate the testing of emulator behavior with regard to the Apple II floating bus and vertical blanking.

On my real unenhanced Apple IIe if I am in text mode and the hires switch is on and the mixed switch is off and I switch to graphics mode while loading a byte from the floating bus:

            sta $C051 ;text mode on
            sta $C052 ;mixed mode off
            sta $C057 ;hires mode on
            ldx $C050 ;graphics mode on and load byte from floating bus
            ldy $C050 ;load another byte from floating bus

then the first byte that got loaded into the X register is a byte from the text screen that was being displayed and the second byte that got loaded into the Y register is a byte from the hires screen that is now being displayed.

Virtual ][ 11.4 and OpenEmulator 1.1.1-202203110628 get it wrong: their floating bus switches to the new video mode too soon: both bytes are taken from the hires screen.

Clock Signal 23.10.29 (plus the fix for #1196 applied) gets it wrong the other way: its floating bus switches to the new video mode too late: both bytes are from the text screen. In fact I have to delay for thousands of cycles after switching video modes before Clock Signal's floating bus returns data from the new video mode.

Here is the source of my sample program that demonstrates the problem. ```asm ; SPDX-FileCopyrightText: © 2023 Ryan Carsten Schmidt ; SPDX-License-Identifier: MIT ;save as instant.s and assemble and link with: ;cl65 -t apple2 -C apple2-asm.cfg --start-addr 0x1000 -u __EXEHDR__ -o instant instant.s KBD = $C000 ;keyboard value KBDSTRB = $C010 ;keyboard strobe TXTCLR = $C050 ;graphics TXTSET = $C051 ;text MIXCLR = $C052 ;no split HIRES = $C057 ;hires PRBLNK = $F948 ;print 3 spaces INIT = $FB2F ;set text mode, page 1, lores, standard text window WAIT = $FCA8 ;delay (26+27*A+5*A*A)/2 cycles (A>0) HOME = $FC58 ;clear text screen 1 and move cursor to top left PRBYTE = $FDDA ;print A as hex SETNORM = $FE84 ;set normal text SETKBD = $FE89 ;set KSW to KEYIN SETVID = $FE93 ;set CSW to COUT1 .proc main jsr SETNORM ;normal text jsr INIT ;text mode, page 1, lores, standard text window jsr SETVID ;standard output jsr SETKBD ;standard input jsr HOME ;clear text screen ldx #$0 ;init low byte counter ldy #$14 ;init high byte counter lda #$42 ;load byte to fill memory with @loreshi: sty @loreslo+2 ;set high byte of address of sta below @loreslo: sta $1400,x ;store the byte (address modified by sty above) inx ;next low byte bne @loreslo ;loop until done iny ;next high byte cpy #$18 ;compare against last high byte bne @loreshi ;loop until done ldy #$20 ;init high byte counter lda #$7F ;load byte to fill memory with @hireshi: sty @hireslo+2 ;set high byte of address of sta below @hireslo: sta $2000,x ;store the byte (address modified by sty above) inx ;next low byte bne @hireslo ;loop until done iny ;next high byte cpy #$40 ;compare against last high byte bne @hireshi ;loop until done sta MIXCLR ;mixed mode off sta HIRES ;hires mode on @here: beq @load1st ;always ldx TXTCLR ;graphics mode on and load floating bus byte @delay: lda #80 jsr WAIT ;delay 17,093 cycles; exits with Z flag set beq @load2nd ;always @load1st: ldx TXTCLR ;graphics mode on and load floating bus byte @load2nd: ldy TXTCLR ;load another floating bus byte sta TXTSET ;text mode on txa ;transfer 1st floating bus byte to A jsr PRBYTE ;print byte jsr PRBLNK ;print 3 spaces tya ;transfer 2nd floating bus byte to A jsr PRBYTE ;print byte @waitkey: lda KBD ;load keypress bpl @waitkey ;loop until keypress sta KBDSTRB ;indicate keypress handled bmi main ;always .endproc ```

You can poke it into memory by entering the monitor with:

CALL -151

and then pasting this in:

1000:20 84 FE 20 2F FB 20 93 FE 20 89
:FE 20 58 FC A2 00 A0 14 A9 42 8C 1A
:10 9D 00 14 E8 D0 FA C8 C0 18 D0 F2
:A0 20 A9 7F 8C 2C 10 9D 00 20 E8 D0
:FA C8 C0 40 D0 F2 8D 52 C0 8D 57 C0
:F0 0A AE 50 C0 A9 50 20 A8 FC F0 03
:AE 50 C0 AC 50 C0 8D 51 C0 8A 20 DA
:FD 20 48 F9 98 20 DA FD AD 00 C0 10
:FB 8D 10 C0 30 9B

Run it with the monitor command:

1000G

It shows and clears the text screen, fills $1400-$1800 (the area scanned by the Apple II/II+ floating bus during horizontal blanking when page 1 of text or lores graphics are shown) with $42, fills $2000-$4000 (hires page 1) with $7F, then shows the hires screen by loading a byte from $C050 into X, loads another byte from $C050 into Y, then switches back to text mode, prints the hex values of X and Y, and waits for a keypress before doing it all again.

The real Apple IIe usually prints A0 7F. A0 is a space with the high bit set (what the text screen is filled with) and 7F is what we filled the hires screen with. Instead of A0 it might print a value from the screen holes.

OpenEmulator and Virtual ][ print 7F 7F.

Clock Signal emulating a IIe prints A0 A0. Emulating a II/II+, we would often see 42 instead of A0. Instead of A0 or 42 we might see other bytes from the screen holes.

If you change the condition at @here: from beq ($F0) to bne ($D0) with the monitor command:

103B:D0

then a substantial delay of 17,093 cycles (approximately the duration of one complete frame, which would be 17,030 cycles) is introduced after switching to graphics mode and before reading the second floating bus byte, which seems to be a long enough delay to fix the problem.

You can experiment with different delay values by changing the value at @delay: for example to reduce it from 80 ($50) to 40 ($28):

1041:28

With that delay, some of the time I am getting A0 7F (correct) and some of the time A0 A0 (switching too late). (The scaling of the delay value is quadratic, not linear.)

The real Apple IIe's behavior makes sense to me based on my limited understanding of how the 6502 works and how the Apple II uses it. The 6502 talks to the RAM for one half of every cycle and the video hardware talks to RAM the other half of every cycle. The load immediate instructions take four cycles. If we begin in text mode, mixed off, hires on, and assume that video hardware is beginning to scan the first pixel of the visible screen, and we consider an instruction like ldx $C050 executing at memory location $1047, then the sequence of events as I understand it is:

CPU                                         Cycle   Video hardware
---                                         -----   --------------

CPU places the PC value ($1047) on the      0
address bus, computes PC=PC+1, and
fetches the value from $1047 - the ldx
opcode, AE - into the predecode
register.

                                            0.5     Video hardware places the first address
                                                    of the text screen ($400) on the address
                                                    bus, fetches the value, and displays the
                                                    pixels for that byte by looking them up
                                                    in the character generator ROM.

CPU places the PC value ($1048) on the      1
address bus, transfers the predecode
register to the instruction register,
computes PC=PC+1, and fetches the value
from $1048 - the low byte of the
operand, 50 - into the input data latch.

                                            1.5     Video hardware places the next address
                                                    of the text screen ($401) on the address
                                                    bus, fetches the value, and displays the
                                                    pixels for that byte by looking them up
                                                    in the character generator ROM.

CPU places the PC value ($1049) on the      2
address bus, transfers the input latch
to the B register, computes PC=PC+1,
adds 0 to B, fetches the value from
$1049 - the high byte of the operand, C0
- into the input data latch, and
captures adder output into the adder
hold register.

                                            2.5     Video hardware places the next address
                                                    of the text screen ($402) on the address
                                                    bus, fetches the value, and displays the
                                                    pixels for that byte by looking them up
                                                    in the character generator ROM.

CPU would ordinarily place the values       3
from the input data latch and adder hold
registers ($C050) on the address bus but
because that's a soft switch not mapped
to actual memory it somehow skips that
step, leaving the address bus set to its
previous value ($402). CPU fetches the
value from $402 - the last displayed
character on the text screen - into the
input data latch.

                                            3.5     The act of mentioning the soft switch
                                                    address $C050 has caused the video
                                                    hardware to be in graphics mode now.
                                                    Video hardware places the next address
                                                    of the hires screen ($2003) on the
                                                    address bus, fetches the value, and
                                                    displays the pixels for that byte
                                                    directly.

At the start of the next instruction,       4
CPU places the PC value ($104A) on the
address bus and transfers the input
latch to the X register.
ryandesign commented 10 months ago

CPU would ordinarily place the values from the input data latch and adder hold registers ($C050) on the address bus but because that's a soft switch not mapped to actual memory it somehow skips that step, leaving the address bus set to its previous value ($402).

Probably the CPU does not in fact skip setting up the address bus, but instead somehow the address bus does not influence the data bus for soft-switch addresses.

I don't have a real Apple II plus available for testing. It is possible that behavior differs between a II plus and a IIe. Don Lancaster describes this in his book Enhancing Your Apple II and IIe Volume 2, Enhancement 13, The Vaporlock:

Page 206:

8AFF:              166 ;  The FIX2+ routine provides one extra
8AFF:              166 ;  delay cycle to adjust for screen switching
8AFF:              166 ;  differences between the IIe and II+.

Page 208:

8BAB:A9 06         265 FIX2+   LDA   #IDBYTE  ; ADD ONE EXTRA CYCLE ONLY ON
8BAD:CD B3 FB      266         CMP   VERSION  ;  THE II+ TO EQUALIZE ON-SCREEN
8BB0:D0 00    8BB2 267         BNE   SHOW     ;  DISPLAY MODE SWITCHING

8BB2:2C 20 C0      269 SHOW    BIT   SNIFF    ; OPTIONAL MODE CHANGES GO HERE
TomHarte commented 6 months ago

The implementation error here is that:

The quick-hack fix is to modify line 745 of AppleII.cpp so that its section reads:

            if(isReadOperation(operation) && address != 0xc000) {
                update_video();
                *value = video_.get_last_read_value(cycles_since_video_update_);
            }

i.e. add an update_video. Then, as if by magic:

Screenshot 2024-02-15 at 21 45 16

The real fix will be marginally more involved, either:

  1. adding a means of lookahead to DeferredQueuePerformer and maintaining a temporary copy of Switches within VideoBase::get_last_read_value; or
  2. eliminating the optimisation of treating vapour reads as lookahead and hence eliminating offset as an argument to get_last_read_value and proceeding as if it were 0.

I'll try to figure out whether the whole lookahead thing is actually saving any real costs in order to pick a route. Quite possibly it's not.