ekeeke / Genesis-Plus-GX

An enhanced port of Genesis Plus - accurate & portable Sega 8/16 bit emulator
Other
697 stars 198 forks source link

Inaccurate Behavior in MCD Sub CPU Bus Request #520

Closed OrionNavattan closed 1 year ago

OrionNavattan commented 1 year ago

Developing my experimental error handler revealed an inaccuracy in the sub CPU bus request mechanism, and specifically in how it is handled if the sub CPU has been stopped using the stop instruction.

I can't find any information about it in the 68000 user and programmer manuals, but what I've observed, it appears that the 68k does not respond to bus requests if the stop instruction has been used to halt it. On real hardware (tested with Genesis Model 1 VA2 + Sega CD Sony Model 2), if a stop instruction is executed on sub CPU, immediately followed by the main CPU requesting the bus, the request will never be granted, leading to an infinite loop if the main CPU is waiting for acknowledgement. GPGX, however, in both the libretro and OpenEmu versions, allows the request to complete as if the sub CPU were still running normally.

OrionNavattan commented 1 year ago

A small test ROM I put together that demonstrates this issue: https://drive.google.com/file/d/1cTOo-_PwBF8Mbei_ZzvOitNoGLKzskPY/view

On real hardware, this ROM hangs on a black screen. In GPGX, it (incorrectly) shows an error handler screen due to the issue I described.

ekeeke commented 1 year ago

Yes, sub-cpu bus request acknowledge is not really emulated as no games rely on this and the timing would be to hard to emulate precisely anyway so $A12001 bit 1 always returns 1 by default as soon as it is set by main-cpu.

It is weird though that 68000 cpu in stopped state would not respond to bus requests, this is indeed not documented anywhere (68000 user manual only says that when cpu is halted - though /HALT pin - bus arbitration signals behave as usual) and I doubt any software emulator implement this behavior. This would also mean that z80 should not be able to access main-cpu bus while it is stopped, which is kinda limiting (this can easily be verified by another test rom I guess).

Or maybe this is only something with that bit on mega-cd gate array side under some specific case, I will have to check what that test ROM is actually doing I guess. Does it behave correctly in other emulators that have Mode 1 support (Kega Fusion, Blastem and Ares I think) ? It would also be interesting to know how it behaves on Mister as their MD core uses a die-accurate 68000 cpu core afaik.

OrionNavattan commented 1 year ago

This would also mean that z80 should not be able to access main-cpu bus while it is stopped, which is kinda limiting (this >can easily be verified by another test rom I guess).

The Z80 uses cycle stealing to read from ROM, which I don't think involves requesting the bus at all. A quick test involving replacing the entire WaitForVBlank routine in a Sonic 2 disassembly with a stop #$2300 had no audible effect on my Model 1 VA2; DAC samples continued to be read from ROM correctly. GPGX, Blastem, and Exodus all behaved as hardware.

I will have to check what that test ROM is actually doing I guess.

I've attached the source of the test ROM below for your convienence. It's basically identical to this, except for adding an illegal instruction to crash the sub CPU, and trapping the sub CPU with a stop instruction instead of an infinite loop. The relevant parts are in the two exception handler files as follows:

; Sub CPU's error handler
; dumps the registers and signals the main CPU that we've crashed
ErrorHandler:
        move    #$2700,sr               ; disable all interrupts
        st.b (mcd_sub_flag).w       ; set flag to let main CPU know we've crashed (assumes communication protocol includes checking this flag for $FF before sending commands or while waiting for responses)
        movem.l d0-a6,-(sp)             ; dump all registers
        move.l  usp,a0
        move.l  a0,-(sp)            ; dump USP (unnecessary if BIOS is being used, as user mode can not be used with it)    
    .waitmain:
        cmpi.b  #$FF,(mcd_main_flag).w  ; has the main CPU noticed?
        bne.s   .waitmain   ; if not, branch
        ; Main CPU has noticed
        move.l  sp,(mcd_subcom_0).w ; get address of bottom of stack (including dumped registers) for main CPU
        clr.b   (mcd_sub_flag).w ; clear flag to let main CPU know we are done
        ;bra.s  *   ; stay here forever (the correct way to do this)
        stop #$2700 ; this does not work, as it prevents the sub CPU from responding to the main CPU's bus request!
; Main CPU's handler for sub CPU errors
; (entered via trap #0 if $FF is found in mcd_main_flag)
SubCPUError:
        disable_ints                ; disable interrupts
        move.b  (mcd_sub_flag).l,(mcd_main_flag).l  ; let sub CPU know we've noticed

    .waitsub:
        tst.b   (mcd_sub_flag).l    ; is the sub CPU done?
        beq.s   .waitsub            ; if not, branch    

    .waitsubbus:    
        ; If sub CPU has been stopped using the stop instruction, this becomes an infinite
        ; loop, as the sub CPU will never acknowledge the request.
        bset    #sub_bus_request_bit,(mcd_reset).l          ; request the sub CPU bus
        beq.s   .waitsubbus                 ; if it has not been granted, wait      

Checking a couple of other platforms: on Blastem Nightly, the ROM causes the entire program to hang, while the Mega Everdrive's FPGA exhibits the exact same incorrect behavior as GPGX (the bus request is granted).

Mega-CD-Error-Handler-Test (GPGX Bug).zip

ekeeke commented 1 year ago

The Z80 uses cycle stealing to read from ROM, which I don't think involves requesting the bus at all

No, 68k bus access is requested on every Z80 cpu access (by bus arbiter) and 68k execution is suspended because of that. That is the reason why z80 access to ROM is halted during VDP DMA from 68k bus (because VDP also requests 68k bus access during DMA)

A quick test involving replacing the entire WaitForVBlank routine in a Sonic 2 disassembly with a stop #$2300 had no audible effect on my Model 1 VA2; DAC samples continued to be read from ROM correctly.

Then this means 68k still grants bus access when it is stopped and what you have noticed is caused by something else specific to mega-cd gate array.

on Blastem Nightly, the ROM causes the entire program to hang,

This confirms there is something else going on because, looking at its source code, Blastem does not prevent sub-cpu bus request when stopped either. It actually behaves the same as GPGX (returns 1 in bit 1 as soon as it is set by main-cpu), with the difference that it also returns 1 if bit 0 is cleared (SUB-CPU halted in RESET state).

Thanks for the source, I will try to figure what could be the real cause of behavior differences between GPGX and real hardware.

ekeeke commented 1 year ago
.waitsub:
        tst.b   (mcd_sub_flag).l    ; is the sub CPU done?
        beq.s   .waitsub

Unless I am missing something, I think this wait loop is incorrect (should be 'bne' instead of 'beq').

Indeed, mcd_sub_flag is set to 0xff (by sub-cpu) when entering main-cpu handler so you want to loop until is cleared by sub-cpu (after stack address has been written to comm register) but here you are waiting for it to be different from 0 ('beq' branches if tested value is zero). On real hardware, with the two cpus operating in parralel, what probably happens is that sub-cpu is fast enough to clear the flag before it is being tested by main-cpu so you get stuck in an infinite loop (but not the loop you initially thought). In software emulators, parralel processing is not possible so main-cpu and sub-cpu need to be run for a few cycles one after another, with the amount of cycles having a significant impact on emulator processing speed (the more you switch between CPUs during emulated timeframe, the less optimized is your emulator). In GPGX, granularity of synchronization between main-cpu and sub-cpu is not as fine as in Blastem, so most likely, the 'tst' instruction from main cpu is processed before sub-cpu timeslot is processed so main-cpu handlers immediately exits the 'waiting' loop and continue its execution.

OrionNavattan commented 1 year ago

Gah, you're right, that branch should be a bne. Sheer luck that it did not cause any problems. Thank you for pointing that out; it's been fixed.

However, that bug does not appear to be related to the issue I described in my initial post. That still occurs even with this bug fixed. (To be clear, the hang on real hardware only occurs if the sub CPU is halted with stop rather than an infinite loop; this still happens even with the branch condition bug fixed.)

ekeeke commented 1 year ago

Damn, that's really weird Could you please upload the corrected test ROM (with branch condition fixed and stop instruction kept) so I can check it when I have some time ? Also, could you verify if that new test ROM behaves as real hardware (main-cpu hang) in Blastem ? As indicated above, looking at the sourcecode, this emulator does not care about sub-cpu stopped status and will always return 1 in that bit once set by main-cpu (like GPGX) so if it still hangs as real hardware, this means the cause is something else.

OrionNavattan commented 1 year ago

Here you go. I checked, and there is no change of behavior on any platform: GPGX and Mega EverDrive Pro FPGA grant the bus request, Blastem locks up, and real hardware hangs on black screen.

MCD Error Handler Test (Stop Ins).zip

birdybro commented 1 year ago

I can't seem to test it on the MiSTer FPGA MegaCD core, the "Press Start" upon BIOS doesn't load at all. Here's the text from the Cue file I created since you said this was a Sega CD test.

FILE "MCD Error Handler Test (Stop Ins).bin" BINARY
  TRACK 01 MODE1/2352
    INDEX 01 00:00:00
ekeeke commented 1 year ago

@birdybro : the .bin file is a Genesis ROM file that uses Sega CD Mode 1, not a CD image binary. It does not require a CD image file, it only makes use of Sega CD hardware (by decompressing Sub-CPU program into Sega CD RAM) but, in Mister, you probably need to create a CUE file in same directory with some random audio tracks to enable Sega CD Mode 1. Note that Genesis Plus GX automatically enables Sega CD mode 1 when it detects a specific field in ROM header ('C' in supported peripherals field), without the need of a CUE file.

ekeeke commented 1 year ago

Blastem locks up

I just tested the ROM file you uploaded in Blastem debugguer and it seems the emulator indeed completely freezes (does not respond to inputs anymore) as soon as the stop instruction is executed on sub-cpu side (if I switch debugger to main-cpu side and do step by step, it freezes while main-cpu waits for sub-cpu to clear the communication flag). Maybe it is related to the fact it is executed during illegal exception processing, I don't know ? It might be interesting to report this to @MaskOfDestiny so he could have a look at it. On real hardware, maybe it is not just the stopped state but the combination of stopped and exception processing state which puts the cpu in some undocumented state where bus requests are ignored, who knows ?

birdybro commented 1 year ago

@birdybro : the .bin file is a Genesis ROM file that uses Sega CD Mode 1, not a CD image binary. It does not require a CD image file, it only makes use of Sega CD hardware (by decompressing Sub-CPU program into Sega CD RAM) but, in Mister, you probably need to create a CUE file in same directory with some random audio tracks to enable Sega CD Mode 1. Note that Genesis Plus GX automatically enables Sega CD mode 1 when it detects a specific field in ROM header ('C' in supported peripherals field), without the need of a CUE file.

Ah I see, thank you. Yeah we just need the cartridge rom named cart.rom in the same folder as the cue/bin (or chd) and it should load the cart. I'm just gonna through an MSU-MD cue/bin in there and do it like this:

image

Testing now and this is the result:

Screenshot 2023-08-13 14-29-46

ekeeke commented 1 year ago

Well, looking at Mister implementation (ASIC.vhd), setting SBRK bit in register A12000 does not really request sub-cpu bus (real hardware would set S68K /BR pin low when SBRK is set to 1, resp. high when it is set to 0) but instead halts the sub-cpu (through S68K /HALT pin) and reading back the register simply returns the set value of SBRK bit (like in GPGX) instead of returning sub-cpu bus access status (real hardware would return 1 only when sub-cpu bus becomes available i.e S68K /BG pin is low and both /BGACK and /AS pins are high).

So, unfortunately, it cannot be used to check behavior of FX68K core when 68K bus is requested while cpu is stopped...

Thanks anyway

EDIT: looking at Mega-CD schematics in available Service Manuals, it appears I am wrong with my assumptions (on what real hardware would do) as sub-cpu bus arbitration signals are not connected to Mega-CD gate array so there is no 'bus request/acknowledge' going on when setting the SBRQ bit and sub-cpu is simply halted using the /HALT signal so Mister implementation is correct on that regard. It just seems that this bit does not read as being set when sub-cpu is already halted by the STOP instruction (maybe the /HALT input is being ignored by 68000 cpu in this case)

OrionNavattan commented 1 year ago

I just tested the ROM file you uploaded in Blastem debugguer and it seems the emulator indeed completely freezes (does not respond to inputs anymore) as soon as the stop instruction is executed on sub-cpu side (if I switch debugger to main-cpu side and do step by step, it freezes while main-cpu waits for sub-cpu to clear the communication flag). Maybe it is related to the fact it is executed during illegal exception processing, I don't know ? It might be interesting to report this to @MaskofDestiny so he could have a look at it. On real hardware, maybe it is not just the stopped state but the combination of stopped and exception processing state which puts the cpu in some undocumented state where bus requests are ignored, who knows ?

It's definitely a bug in Blastem, and not one related to exception processing; I checked, and the freeze also occurs if a stop instruction is encountered during normal execution. I'll report it in the RetroDev Discord.

And funnily enough, I read this just as I finished an update to the test ROM that all but confirms what I suspected was happening on real hardware. I added a feature that counts down while waiting for the bus request to complete, triggering a custom error message via trap 1 if it times out, and that message is being triggered. MCD Sub Stop Ins Test.zip

ekeeke commented 1 year ago

Well, I guess that settles it then, thanks for your tests.

It's surprising that this is not documented anywhere though (I tried to look in Jorge Cwik FX68K core and in recent Nuked verilog core, both implemented from 68000 die analysis, but could not really find what the STOP instruction exactly does and what is its impact on signals used by bus arbitration).

Not sure either what exact conditions or logic are used by Mega-CD gate array to return 1 in SBRQ bit.

Since you have real Mega-CD hardware, could you please do the same test as in this last test ROM but with bit 0 (SRES) and bit 1 (SBRQ) set to different values successively, namely 00b (sub-cpu in reset state, no bus request), 01b (sub-cpu in reset state, bus requested) and 11b (sub-cpu running after reset, bus requested), then check SBRQ bit returned value. It would also be interesting to test these values with sub-cpu not being stopped (at least the two first values, to see if bus appears as available when sub-cpu is in reset state). Note that you will need to modify the BSET/BEQ sequence to do this (and use MOVE.B/BTST/BEQ instead), which is also the way the bits should be set in this register according to Mega-CD hardware manual (it explicitely states BSET should not be used but I am not sure why, maybe because setting the other bit with its read value is not safe)

OrionNavattan commented 1 year ago

it explicitely states BSET should not be used but I am not sure why

I think it's safe to say that's yet another instance of incorrect information in the development manuals, given that those BSET/BEQ and BCLR/BNE sequences are practically taken verbatim from the North American Model 2 BIOS. In fact, a quick check in a hex editor confirms that at least the latter is used in every known BIOS bootrom, so it begs the question where the hell that incorrect statement came from.

Anyhow, made six test roms this time around, testing each of those conditions with the bus request and reset bits, one with sub CPU stopped, and one with it trapped in an infinite loop. They're not very sophisicated, simply showing a green screen if the bus request succeeds, and a timeout error if the request fails. In all three cases, the bus request fails if the sub CPU is halted with a stop instruction. If the sub CPU is running, it appears that the status of reset does not matter, the bus request will succeed even if the sub CPU is in the reset state (the NA Model 2 BIOS in fact relies on this behavior, and I suspect all the other BIOS do so too).

More curious, however, is the fact that simply resetting the sub CPU without requesting the bus appears to nevertheless trigger a bus request. Clearing the reset bit, both via move and bclr, results in the bus request bit returning 1 soon thereafter. It seems there is still a lot left to learn about the Mega CD hardware.

MCD Sub Stop Ins Test.zip

ekeeke commented 1 year ago

Thanks for these reports, this is much appreciated.

More curious, however, is the fact that simply resetting the sub CPU without requesting the bus appears to nevertheless trigger a bus request. Clearing the reset bit, both via move and bclr, results in the bus request bit returning 1 soon thereafter.

Well, this is how Blastem emulates it (SBRK returned value is an OR between SBRK set value and invert of SRES set value) and it kinda makes sense as this bit, when read, is supposed to reflect the state of the bus, which would be available to external device by default in case sub-cpu is in reset state.

This would also explain why it is not recommended to use BCLR/BSET but only when modifying SRES bit to reset sub-cpu (BCLR would clear SRES bit but also set SBRQ bit so next BSET call would set both SRES and SBRQ bits, leaving sub-cpu in bus requested state)

ekeeke commented 1 year ago

@OrionNavattan : for completeness, could you please verify if main cpu can access PRG-RAM in case of bus request failure (when sub-cpu is in stopped state), in each 3 cases (reset only, reset+busreq, busreq only) possibly + also check the value of the RESET bit to see if stopped state has an effect on it as well. For PRG-RAM access test, a simple pattern write/readback after the timeout should be enough.

Thanks in advance.

OrionNavattan commented 1 year ago

Apologies for the delay, the notification ended up in my spam folder.

Anyhow, another set of six test ROMs.

        move.w  #$100-1,d0  ; maximum time to wait for response
    .waitsubbus:
    ; Three options.
    ;   move.b  #sub_reset,(mcd_reset).l            ; reset sub CPU
    ;   move.b  #sub_reset|sub_bus_request,(mcd_reset).l    ; reset sub CPU and request bus     
        move.b  #sub_run|sub_bus_request,(mcd_reset).l      ; request bus
        btst    #sub_bus_request_bit,(mcd_reset).l      ; has bus request been granted?
        bne.s   .granted                    ; branch if so
        dbeq    d0,.waitsubbus                  ; if not, wait
        ; Carry on even if we've timed out  

    .granted:
        clr.w   -2(sp)  ; clear a word of the stack for the reset register readout
        moveq   #0,d0
        move.b  (mcd_reset).l,d0        ; get reset register
        move.w  d0,-(sp)            ; write to stack as word
        move.l  #'SEGA',(program_ram+$7000).l   ; write 'SEGA' to program RAM
        move.l  (program_ram+$7000).l,-(sp) ; read the string from program RAM back into the stack
        illegal                 ; crash main CPU so we can view the stack

Not surprisingly, attempting to access the program RAM while sub CPU is in stopped state results in bus contention, with program RAM reads returning garbage. If we've only requested the bus, the reset bit returns 1 as expected. However, if reset has been requested, the bit returns 0, even after counting down the full $FF in the loop counter (which I'm fairly certain is longer than the time required to assert HALT and RESET for 10 clock cycles to perform a soft reset). It seems like requesting sub CPU reset also fails if the sub CPU has been halted with the stop instruction.

MCD BRQ Test.zip

ekeeke commented 1 year ago

Apologies for the delay, the notification ended up in my spam folder. Anyhow, another set of six test ROMs.

No problem, thank you for taking the time to do this.

Not surprisingly, attempting to access the program RAM while sub CPU is in stopped state results in bus contention, with program RAM reads returning garbage.

Yes, this seemed kinda logical but it's nice to confirm it: the Mega-CD gate array probably uses busreq status bit (which is not exactly busreq bit set value, as figured above) as a switch between MD and MCD side for PRG-RAM access (this switch is somehow described in available Mega-CD manuals).

If we've only requested the bus, the reset bit returns 1 as expected. However, if reset has been requested, the bit returns 0, even after counting down the full $FF in the loop counter (which I'm fairly certain is longer than the time required to assert HALT and RESET for 10 clock cycles to perform a soft reset). It seems like requesting sub CPU reset also fails if the sub CPU has been halted with the stop instruction.

Unless you mean you see different values being returned for the reset bit on real hardware depending on sub-cpu being stopped or not, this is expected behavior to me: reset bit read value should correspond to set value and to reset sub-cpu, you need to clear that bit then set it back to 1 for the sub-cpu to be in running state. I believe that setting that bit to 0 asserts both /RESET and /HALT inputs of sub-cpu and setting it to 1 releases them (those inputs being active low), at least this how I emulate it and it appears to work fine (CD BIOS always set reset bit back to 1 after clearing it). When read, the reset bit probably directly corresponds to the state of the /RESET line.

Now regarding the BUSREQ bit, after looking at the schematics in available service manuals, it appears the Mega-CD gate array do not use M68K bus arbitration signals (/BR and /BGACK inputs are connected to Vcc and /BG output is left unconnected), which means it instead uses the /HALT input to halt the sub-cpu (and by extension releases the sub-cpu bus).

What probably happens is that, to determine busreq status, the Mega-CD gate array probably looks at the state of some sub-cpu signals (/AS maybe) in combination with /HALT state to determine if sub-cpu bus is released (when /HALT is asserted, 68K cpu finishes its current bus cycle before releasing the bus) but, when it is stopped state, these signals are held in a different state so the gate-array thinks the bus is still being used.

This would need to be confirmed by looking at the signals state while cpu is stopped when /HALT is asserted (or by figuring the exact logic behind the busreq bit through Mega-CD chip die analysis) but simply forcing busreq status to 0 when sub-cpu is in stopped state should be enough to emulate that behavior.

ekeeke commented 1 year ago

Fixed by https://github.com/ekeeke/Genesis-Plus-GX/commit/5a3e18f71bbd13430282de6c08f77785d2c21853

For the record, here are the obtained results with latest build for each of the 6 last test ROMS uploaded above. This should be the same as what you see on real Mega-CD hardware (feel free to re-open the issue if this is not the case):

MCD BRQ Test (brq reset, sub inf loop).bin image

MCD BRQ Test (brq reset, sub stop).bin image

MCD BRQ Test (brq, sub inf loop).bin image

MCD BRQ Test (brq, sub stop).bin image

MCD BRQ Test (reset, sub inf loop).bin image

MCD BRQ Test (reset, sub stop).bin image

OrionNavattan commented 1 year ago

It looks like the latest build matches hardware behavior now. The values for the reset register in your screenshots are the same as those on both Funai and Sony Model 2s. (The bus contention garbage is different, but that probably isn't too important, so long as it's garbage returned.)

Glad I could be of help here. :>

ekeeke commented 1 year ago

The bus contention garbage is different, but that probably isn't too important, so long as it's garbage returned.)

Out of curiosity, what value(s) are you observing on real hardware ? Usually, "open bus" value corresponds to last value fetched on main-cpu bus, which in this case, would correspond to next main-cpu instruction due to M68K prefetch mechanism (0x4AFC = ILLEGAL)

OrionNavattan commented 1 year ago

On both models, reading the 'SEGA' string with bus contention returns $14810300.