AppleWin / AppleWin

Apple II emulator for Windows
GNU General Public License v2.0
715 stars 164 forks source link

Wasteland Booting Weirdness #733

Closed GreatHierophant closed 2 years ago

GreatHierophant commented 4 years ago

I found a .woz of Wasteland and the game won't boot at all on AppleWin but boots fine on micro8. It is almost as if the drive were empty.

Then I thought, could the lack of .woz write support be holding Wasteland back? So I grabbed the .nib images from Asimov and tried them. The .nib images would only load with the Enhanced Speed option. With Authentic speed, it would continually cycle to the Electronic Arts logo.

tomcw commented 4 years ago

Can you attach a zip of the .woz image? Thanks.

tomcw commented 4 years ago

re. .nib: authentic vs enhanced - I've confirmed this issue.

GreatHierophant commented 4 years ago

Here it is :

Wasteland Woz.zip

tomcw commented 4 years ago

So I grabbed the .nib images from Asimov and tried them. The .nib images would only load with the Enhanced Speed option. With Authentic speed, it would continually cycle to the Electronic Arts logo.

Basically (for the .nib image) the game is still copy-protected. Enhanced Speed (by luck!) passes the protection check, but Authentic Speed doesn't.

I'll change it so that Authentic Speed matches Enhanced Speed for a certain emulation detail. In general, since .woz now exists, then for closer authenticity use .woz.

Details:

At track 7 (and afterwards other tracks) this code is called at $666: :-)

665:RTS
666:LDY #$FF
    LDA $C054,X ; $C0EC     ; X=0x98; loop's cycle delta=19
    CMP 458
    STA 458
    BEQ 665
    DEY
    BNE 668
    DEC 457     ; 1->0
    BNE 665
    JMP 2DC     ; zero memory & reboot!

In Authentic Speed mode, because the $C0EC accesses are less than 32 cycles apart, then the Disk II's data latch is loaded first with a partial nibble, then with a full nibble. So it ping-pongs between partial nibble and full nibble. So the CMP is always not-equal, so the routine times out and eventually JMP's to $2DC.

In Enhanced Speed mode, partial nibbles are only supported for $C0EC accesses less than 7 cycles. So the loop only reads complete nibbles. After a few iterations there are 2 identical, consecutive nibbles... so the CMP is equal and it RTS's via $665 (ie. protection check passes).

For details of why there's a difference (Authentic vs Enhanced) see #582.

NB. this all related just to .nib (and .dsk) images, and not .woz image support.

I'll have to check the .woz, but I suspect there are sync bits on the .woz that are missing in the .nib image representation.

tomcw commented 4 years ago

I found a .woz of Wasteland and the game won't boot at all on AppleWin but boots fine on micro8. It is almost as if the drive were empty.

Wasteland's boot code looks very similar to Legacy of the Ancients (as noted by @peterferrie).

For .woz, during executing of this initial T$00-S$00 sector (at $801), LOA boots, and Wasteland doesn't.

For Wasteland, here is a trace of the code that fails:

Cycles   A: X: Y: SP:  Flags     Addr:Opcode    Mnemonic
001D411F B3 5F FF 015F N.RB.I.C  08A4:D0 DF     BNE $0885
001D4122 B3 5F FF 015F N.RB.I.C  0885:BD 8D C0  LDA $C08D,X
001D4126 55 5F FF 015F ..RB.I.C  0888:10 FB     BPL $0885
001D4129 55 5F FF 015F ..RB.I.C  0885:BD 8D C0  LDA $C08D,X
001D412D AA 5F FF 015F N.RB.I.C  0888:10 FB     BPL $0885
001D412F AA 5F FF 015F N.RB.I.C  088A:38        SEC 
001D4131 AA 5F FF 015F N.RB.I.C  088B:2A        ROL 
001D4133 55 5F FF 015F ..RB.I.C  088C:8D FE 07  STA $07FE
001D4137 55 5F FF 015F ..RB.I.C  088F:8C 44 08  STY $0844
001D413B 55 5F FF 015F ..RB.I.C  0892:BD 8D C0  LDA $C08D,X
001D413F 2A 5F FF 015F ..RB.I.C  0895:10 FB     BPL $0892
001D4142 2A 5F FF 015F ..RB.I.C  0892:BD 8D C0  LDA $C08D,X
001D4146 55 5F FF 015F ..RB.I.C  0895:10 FB     BPL $0892
001D4149 55 5F FF 015F ..RB.I.C  0892:BD 8D C0  LDA $C08D,X     ; [1]
001D414D AA 5F FF 015F N.RB.I.C  0895:10 FB     BPL $0892
001D414F AA 5F FF 015F N.RB.I.C  0897:2D FE 07  AND $07FE
001D4153 00 5F FF 015F ..RB.IZC  089A:99 00 02  STA $0200,Y
001D4158 00 5F FF 015F ..RB.IZC  089D:4D FF 07  EOR $07FF
001D415C B3 5F FF 015F N.RB.I.C  08A0:8D FF 07  STA $07FF
001D4160 B3 5F FF 015F N.RB.I.C  08A3:C8        INY 
001D4162 B3 5F 00 015F ..RB.IZC  08A4:D0 DF     BNE $0885
001D4164 B3 5F 00 015F ..RB.IZC  08A6:AC AC 08  LDY $08AC
001D4168 B3 5F BD 015F N.RB.I.C  08A9:8C B9 08  STY $08B9
001D416C B3 5F BD 015F N.RB.I.C  08AC:BD 8D C0  LDA $C08D,X     ; [2]
001D4170 EA 5F BD 015F N.RB.I.C  08AF:10 FB     BPL $08AC

The problem is the 35 cycles between [1] and [2] $C0EC accesses, ie. 8.75 disk bitcells. Then including extraCycles from the previous access, this takes it to 9 bitcells, so the value in the data-latch (0xEA) get skipped at [2].

NB. At [2], the LDA $C0EC should immediately load 0xEA, but instead it skips it and returns a partial nibble (0x03). (Confirmed with MAME 0.216.) Because 0xEA is skipped, the sector's checksum fails, and it retries by JMP'ing to $C600.

Also note that in this 4&4 encoded sector, there are no hidden sync bits. So this is purely a timing issue due to the big (35 cycle) gap between $C0EC accesses.

tomcw commented 4 years ago

I'll have to check the .woz, but I suspect there are sync bits on the .woz that are missing in the .nib image representation.

Dumping track-7, there is a huge run of 324 zero bits!

0505:   99   F6   F3   AD   BD   ED   F3   CD   ED   CA   FD   CD   EB   C9   FE(1)FF(2)FE(1)FF(2)FE(1)FF(2)FE(1)FF(2)FE(1)FF(2)FE(1)FF(2)D5   FE(324)

The code at $666 loops reading $C0EC (in the sync FE/FF's just before the 324 zero bits) until it reads a repeat value! (or until there's a timeout).

Here are 3 runs capturing the value read from the latch ($C0EC) until there's a repeat: trk07-C0EC-reads.xlsx

tomcw commented 4 years ago

NB. The NIB (master1.nib) just contains FF's in this big field of zero bits (probably FF's were used to pad the track to 0x1A00) - but it means it'll pass the protection.

tomcw commented 4 years ago

@GreatHierophant: a new AppleWin with this fixed is here.

tomcw commented 4 years ago

FYI, I added this (temp) debug code to the end of DataLatchReadWOZ():

    static BYTE lastPhase = 0xff;
    static UINT lastLatch = -1;
    if (regs.pc == 0x66b)
    {
        if (lastPhase != drive.m_phase)
        {
            lastPhase = drive.m_phase;
        }
        else if (lastLatch == m_floppyLatch)
        {
            LogOutput("0x66B: trk-%d, latch match: %02X, #reads=%d\n", drive.m_phase/2, m_floppyLatch, 0x100-regs.y);
        }
        else if (regs.y == 0x01)
        {
            LogOutput("0x66B: trk-%d, failed\n", drive.m_phase/2);
        }
        lastLatch = m_floppyLatch;
    }

This traces the boot, outputting the value of the repeat latch ("latch match") and the number of $C0EC reads until it got a match.

I varied the chance of emitting a 1 fake bit (default is 30% chance), looking at 50%, 40%, ..., 10%. The results are summarised in this table:

Chance of a fake 1-bit Mean number of C0EC reads Number of protection failures
50% 70.04 2
40% 60.76 2 + reset
30% 35.96 0
20% 31.58 0
10% 30.62 0

Full data is here: wasteland-protection.txt

So the boot becomes more reliable if you reduce the chance to throw a fake 1 bit.

tomcw commented 4 years ago

OK, I understand the protection now... It's a function of (a) reading a nibble (ie. bit7/QA is set), then (b) getting a run of 5 consecutive zeros.

A run of 5 zero-bits has a (0.7)^5 = 16.8% (or 1 in 6) chance of occurring. (For the case: 30% chance of a fake 1 bit)

Details:

Corollary: as the chance of getting a fake 1 decreases (eg. "reduce the chance to throw a fake bit to 20%"), then the chance of getting a 0 increases. And so the chance of the protection check passing also increases, as the chance of a '5 zero-bit run' increases:

tomcw commented 4 years ago

In the above attached wasteland-protection.txt for 2 cases, I've dumped all the outputBits + if it's rand() or not.

In the first case, there are only 4 zero bits (3 randomly zero) and a final 1 fake bit. But because the latchDelay is kept at 7us, then this final 1 bit continues to keep the latch value held (as the latchDelay just drops from 7us to 3us).

In the second case, it randomly produces the nibble 0xF0, then has 6 (random) zero bits; which again keeps the latch value held.

tomcw commented 4 years ago

For .woz, during executing of this initial T$00-S$00 sector (at $801), LOA boots, and Wasteland doesn't.

Looking in more detail at the Wasteland boot code at $8AC (read $C0EC) where it should read 0xEA. The failing case is 1 bitCell ahead of the working case:

OK:outputBit dataLatch latchDelay bit# read/MSB=1 NG:outputBit dataLatch latchDelay bit# read/MSB=1
0 AA 7 0 * 1 AA 3 0 *
1 AA 3 8 1 03 0 8
1 03 0 7 1 07 0 7
1 07 0 6 0 0E 0 6
0 0E 0 5 1 1D 0 5
1 1D 0 4 0 3A 0 4
0 3A 0 3 1 75 0 3
1 75 0 2 0 EA 7 2
0 EA 7 1 1 EA 3 1
1 EA 3 0 * 1 03 0 0

The working case, at the transition point from DiskII C600 firmware to $82D (1st $C0EC read):

The failing case (at the same transition point):

failing(diskLastCycle) MINUS working(diskLastCycle) = 0x1c32e1 - 0x1c32dd = 4 cycles = 1 bitCell.

So via the DiskII C600 firmware, the failing case arrives at $801 one bitCell ahead.

It never recovers from this (even though the LSS sequences through a bitCell field of ~317 zeros right after T$00-S$00!), and the big 35 cycle gap at the end of the 4&4 sector means that the checksum nibble is missed.

tomcw commented 4 years ago

It never recovers from this (even though the LSS sequences through a bitCell field of ~317 zeros right after T$00-S$00!), and the big 35 cycle gap at the end of the 4&4 sector means that the checksum nibble is missed.

Removing the above woz fix (ba7a400) and alt woz fix (183ec2b), then...

If I insert an extra zero bit (into the field of ~317 zeros) then this is enough to have the CPU correctly sync'ed with the WOZ bitstream and correctly read 0xEA!

So either there's a very precise number of zero bits written or there's something not quite right with LSS emulation. I'd lean towards the latter.

Also the way the track was originally written, the end point (ie. where it overlaps with the start) looks to be at the end of the zero field (there few garbage nibbles after the zero field before the big FE(1),FF(2) sync field). Therefore this garbage will not have an exact number of bits, so again there's probably something up with the LSS emulation.

tomcw commented 4 years ago

The LSS is clocked at 2MHz by Q3 (slot pin 37). ref: UTAIIe page 9-14. But my LSS emu is "quantized" into 4us bitCells or 0.25MHz (actually it does support non-4us timings, as defined by the woz spec). The extra (1-3) cycles are carried over to the next time $C0EC (actually any even $C0Ex address) is read.

So I think my alt woz fix (183ec2b) is correct, in that it can extend the latch delay by an extra 2 CPU cycles. NB. 7+2 extra=9 cycles. Since my LSS emu operates in units of 4us (4 cycles) then 3 gives the same result as 2, ie. both 9 or 10 cycles will hold the data latch for 3 bitCells, since it's decremented by 4 cycles for each subsequent bitCell.

Put another way, if I ran my LSS emu at 2us intervals (0.5 bitCell unit) then the extra cycles could only be 0 or 1. This would give higher precision to the latch delay, since it'd be decremented by 2 cycles for each subsequent half bitCell.

tomcw commented 4 years ago

NB. 1.29.8.0 here includes the alt woz fix (183ec2b).

tomcw commented 4 years ago

On Slack / #applesauce I asked about the 35us gap and if there could be variable bit timing here.

Info about other emulators:


The interesting things about this T$00-S$00 part of the loader, is that it's just loading 4&4 encoded data, and there are no extra zero bits for any special timing. It's just simple 4&4 encoding.

The odd thing is this 35 cycle gap between $C0EC accesses (which occurs at the transition from reading the 2x256 4&4 data nibbles to reading the 1st 4&4 checksum nibble), eg:

Since each 'LDA $C0EC' is immediately followed by a 'BPL -5' (ie. BPL to the prior LDA $C0EC access) then everything looks normal... ie. no "raw read" from $C0EC - which would immediately raise a flag for specially timed code.

But if the bits were written with (say) 4.5us timing, then 8x4.5us = 36us.

And if you tried to copy this with a nibble copier, then it'd read the nibbles correctly, but obviously wouldn't write them with the correct timing... so you'd just get this infinite loop where it fails and JMP $C600. Which is what we're seeing with some of the emulators.

NB. The sector this code is reading has the (custom) address prologue: D5 AA FD F4 F4 So possible worth taking a look at the timings of the bits here?

Here's a dump of track 0:

Wasteland - Boot #1-trk-00.txt


stivo attached a Wasteland woz with bit timings = 36us, which boots on AppleWin (with & without the Wasteland alt fix).

I hardcoded AppleWin to use 36us timing, and then I tried all the AppleSauce test images: only Sammy Lightfoot failed to boot (and Miner 2049er II got corrupted due to a write to the high-score table!).


J.Morris then looked at the Wasteland .a2r flux image and confirmed that the nibbles are 32us - sadly none are 36us.

These are the nibbles that I see at the beginning of that sector on track $0 with the D5 AA FD F4 F4 address field. They all look pretty solid at 32us to me.

Screen Shot 2020-06-13 at 7 55 15 AM

around the end of that sector: AA EA FA C9

Screen Shot 2020-06-13 at 9 02 08 AM
tomcw commented 3 years ago

The Wasteland fix for the T$00-S$00 part of the loader is interfering with a few other titles (#762, #921), so it looks like this fix is wrong.

So I'm coming to the conclusion that the 35 cycles between the two specific $C0EC accesses just needs the right "LSS alignment" with the bit-stream. Introducing a bit of jitter on T$00 (eg. after a large bit-stream gap between $C0EC accesses) is enough to get both Wasteland and LOA (v1/v2) working(*), and not cause an issue for any other woz.

(*) Maybe after 5-10 retries - on a failure the code just reboots via JMP $C600.

tomcw commented 2 years ago

Closing as this woz-related issue was fixed by #930.