MEGA65 / mega65-core

MEGA65 FPGA core
Other
238 stars 84 forks source link

Dev Core build #73 / Cartflash #55, Freezer Exit Screen Corruption when D81 is mounted #753

Closed KiDraDragon closed 4 months ago

KiDraDragon commented 7 months ago

Test Environment (required)

Describe the bug Using Dev Core Build #73 or Cartflash Core build #55 there is screen corruption when exiting the Freezer, when a D81 image was mounted and the directory listing listed via DIR, without executing a DIR command prior to mounting the image.

When using DIR in addition before (!) mounting a D81-image, then Freezer exits normally.

To Reproduce Steps to reproduce the behavior:

  1. Boot either mentioned core version
  2. Mount any D81 image, e.g. MEGAPOLY.D81: BOOT > MOUNT"MEGAPOLY.D81" > DIR > Enter Freezer > Exit Freezer (F3) or
  3. Boot either mentioned core version
  4. Have MEGA65.D81 present and automounted: BOOT > Enter Freezer > Exit Freezer (F3)

Expected behavior Exiting Freezer should return to BASIC65 screen in a usable state.

Screenshots 1 2 3

Additional context Two different SD-Cards (32gb Sandisk Ultra / Extreme) were used. One was wiped with Windows Tool Rufus.ie and prepared using the MEGA65 builtin tools. This SD-Card was populated with the MEGA65 formatter using the DEV-Core.

The other SD-Card was an in-use card and M65-Files were copied over via megaftp (ethernet). No difference in behavior between the two.

Discussion on Discord https://discord.com/channels/719326990221574164/1181168989855043644

dansanderson commented 7 months ago

Note for the Issue: We're still working on making the repro steps more specific so this can be reliably reproduced. At least one other person has seen this, but we're still narrowing down the circumstances.

Context: There was a Freezer change since stable release v0.95 to accommodate freezing the new errata register. This included a slight change to the Freezer file format to include the register, though no byte positions were changed (intentionally). One thing to consider is whether older Freezer files un-freeze in an odd way. KiDra is noticing this even after wiping both partitions and re-formatting the SD card with the latest dev core's SD card utility, which either rules this out, or maybe narrows this to other Freezer logic that needs a matching update.

An early version of the errata register caused Hot Registers to update, which was interfering with Freezer entry (not exit). We made a design decision to have the errata register not update Hot Registers. This isn't likely to be the issue, just something to know about this register.

We should test the values of the VIC registers (especially SCRNPTR and CHARPTR) in the corrupt state and compare it to the working state, and see if we can get ahold of the Freezer data being un-frozen into the corrupt state to determine how the bad values (if any) are getting written. It's less likely—but still possible given what the Freezer does—that the screen and character memory itself (the memory pointed to by SCRNPTR and CHARPTR) is getting written with incorrect values.

johnwayner commented 7 months ago

Some quick testing tonight seems to indicate that this was introduced in build 72 (https://builder.mega65.org/job/mega65-core/job/development/72/). Unfortunately there are a lot of changes in there.

johnwayner commented 7 months ago

I was able to fix the problem by using a HICKUP from build 71 on 73's core. And also reproduce the issue using 73's HICKUP on 71's core. So I used git bisect to narrow down to the commit that caused the failing HICKUP: https://github.com/MEGA65/mega65-core/commit/1a91eb715f11e6c7f4b8a5f5c1b7cbae1dbbe57f

It seems to be the change from

!32 $ffd3000
!16 $0080

to

!32 $ffd3000
!16 $0100

Not sure why the F011 and SD card areas were added. I think I'll leave this to others now :)

johnwayner commented 7 months ago

Oh, I forgot to mention that I did some memory dump comparisons before and after the corruption. Memory is pretty much totally obliterated with garbage after the corruption. Even the ROM data is destroyed. It's impressive :)

dansanderson commented 7 months ago

Yes, I believe this is the suspect change, related to the freezer format change I mentioned. It was explained to me that it doesn't change the size of the file format because those bytes are already reserved in blocks of that size. That might be incorrect, or it might just be that the extra registers being restored has consequences that we haven't accounted for yet. The goal was to include the VIC-IV errata register (D08F) in the freezer data. Maybe we need to exclude some registers?

I didn't have a plan to account for older Freezer files that might have blank data in the newly included region. But that shouldn't be this specific issue because some can repro this starting from a fully wiped SD card formatted by the new core.

Maybe @gardners or @lydon42 can take a closer look when they get a chance?

lydon42 commented 7 months ago

In my memory freezer was changed to include the errata flags (essentially expanding the VIC save are to full 0x100 if I remember correctly). Note that not showing computer frames around slot snapshots can result from having the old images on the SD card. The new format has more information in the files (also if I remember correctly).

We will get to checking freezer when we are done with r5 bringup. This has the highest priority right now, as without r5 test we can't start producing boards! So please check on your own if you are interested in helping, but don't pull @gardners away from working on r5 :)

johnwayner commented 7 months ago

The issue is centered around unfreezing to FFD3081 (D081) which is part of F011: https://github.com/MEGA65/mega65-core/blob/7af2409563f37f35c718c4bf3da5282ebc26f207/src/vhdl/sdcardio.vhdl#L2461

This register includes all sorts of stuff that mess with the buffers used to read data from the sd card. I suspect that the results will vary a bit depending on what value is being written back to this location. The usual result is that the following entries in the mem_list are not restored properly. This includes all of chip ram (ROMs, included).

A possible fix I've tested briefly is to separately save $D08F into the scratch area at the beginning of the freeze and to unfreeze it in an post unfreeze handler for the viciv DMA entry (or any number of other places). And, of course, the VICIV DMA entry must be changed back to only copying $80 bytes so it doesn't get to D081.

I can put a pull request in for this if you want.

dansanderson commented 7 months ago

Well done!

I would think the complete change would be to freeze/unfreeze as much state as possible, and understand the register behaviors well enough to know which specific exceptions need to be made. Such a change should maybe be careful about the last few undefined register locations, for example, including those that take up a fraction of a byte. Maybe the DMA entry needs to be changed back to $80, but then we need another solution for unfreezing the upper registers that ought to be unfrozen.

@johnwayner If you're so inclined, please keep going!

johnwayner commented 7 months ago

@dansanderson It might be prudent to address the saving of the errata flag and this particular issue now. Then, in a new feature issue, tackle determining what data from $D080-$D0FF should be saved -- and how to do save it. It's not even clear to me what is even in $D090-$D0FF. The book is light on details there and I haven't looked in the VHDL to find out. Also, if we freeze more in this area, does the freezer become too specific to the normal Mega 65 I/O personality (vs other personalities)? I don't know.

dansanderson commented 7 months ago

I agree, unfreezing only the errata register (D08F) is the safest change for now, and might be the only change needed for a while.

Here's the result of make iomap.txt in the development branch right now, excerpted from D080 to D0FF. Going just from the doc comments, it doesn't look like it's appropriate to unfreeze any of the other registers in this region.

C65 $D080 - F011 FDC control
C65 $D080.0-2 FDC:DS Drive select (0 to 7). Internal drive is 0. Second floppy drive on internal cable is 1. Other values reserved for C1565 external drive interface.
C65 $D080.3 FDC:SIDE Directly controls the SIDE signal to the floppy drive, i.e., selecting which side of the media is active.
C65 $D080.4 FDC:SWAP Swap upper and lower halves of data buffer (i.e. invert bit 8 of the sector buffer)
C65 $D080.5 FDC:MOTOR Activates drive motor and LED (unless LED signal is also set, causing the drive LED to blink)
C65 $D080.6 FDC:LED Drive LED blinks when set
C65 $D080.7 FDC:IRQ When set, enables interrupts to occur. Clearing clears any pending interrupt and disables interrupts until set again.
C65 $D081 FDC:COMMAND F011 FDC command register
C65 $D081.0 FDC:NOBUF Reset the sector buffer read/write pointers
C65 $D081.1 FDC:ALT Selects alternate DPLL read recovery method (not implemented)
C65 $D081.2 FDC:ALGO Selects reading and writing algorithm (currently ignored).
C65 $D081.3 FDC:DIR Sets the stepping direction (inward vs
C65 $D081.4 FDC:STEP Writing 1 causes the head to step in the indicated direction
C65 $D081.5 FDC:FREE Command is a free-format (low level) operation
C65 $D081.6 FDC:RDCMD Command is a read operation if set
C65 $D081.7 FDC:WRCMD Command is a write operation if set
C65 $D082 - F011 FDC Status A port (read only)
C65 $D082.0 FDC:TK0 F011 Head is over track 0 flag (read only)
C65 $D082.1 FDC:PROT F011 Disk write protect flag (read only)
C65 $D082.2 FDC:LOST F011 LOST flag (data was lost during transfer, i.e., CPU did not read data fast enough) (read only)
C65 $D082.3 FDC:CRC F011 FDC CRC check failure flag (read only)
C65 $D082.4 FDC:RNF F011 FDC Request Not Found (RNF), i.e., a sector read or write operation did not find the requested sector (read only)
C65 $D082.5 FDC:EQ F011 FDC CPU and disk pointers to sector buffer are equal, indicating that the sector buffer is either full or empty. (read only)
C65 $D082.6 FDC:DRQ F011 FDC DRQ flag (one or more bytes of data are ready) (read only)
C65 $D082.7 FDC:BUSY F011 FDC busy flag (command is being executed) (read only)
C65 $D083 - F011 FDC Status B port (read only)
C65 $D083.0 FDC:DSKCHG F011 disk change sense (read only)
C65 $D083.1 FDC:IRQ The floppy controller has generated an interrupt (read only). Note that interrupts are not currently implemented on the 45GS27.
C65 $D083.2 FDC:INDEX F011 Index hole sense (read only)
C65 $D083.3 FDC:DISKIN F011 Disk sense (read only)
C65 $D083.4 FDC:WGATE F011 write gate flag. Indicates that the drive is currently writing to media.  Bad things may happen if a write transaction is aborted (read only)
C65 $D083.5 FDC:RUN F011 Successive match.  A synonym of RDREQ on the 45IO47 (read only)
C65 $D083.6 FDC:WTREQ F011 Write Request flag, i.e., the requested sector was found during a write operation (read only)
C65 $D083.7 FDC:RDREQ F011 Read Request flag, i.e., the requested sector was found during a read operation (read only)
C65 $D084 FDC:TRACK F011 FDC track selection register
C65 $D085 FDC:SECTOR F011 FDC sector selection register
C65 $D086 FDC:SIDE F011 FDC side selection register
C65 $D087 FDC:DATA F011 FDC data register (read/write) for accessing the floppy controller's 512 byte sector buffer
C65 $D088 FDC:CLOCK Set or read the clock pattern to be used when writing address and data marks. Should normally be left $FF
C65 $D089 FDC:STEP Set or read the track stepping rate in 62.5 microsecond steps (normally 128, i.e., 8 milliseconds).
C65 $D08A FDC:PCODE (Read only) returns the protection code of the most recently read sector. Was intended for rudimentary copy protection. Not implemented.
GS $D08F - Set/get MISCIO:HWERRATA MEGA65 hardware errata level
GS $D09B - FSM state of low-level SD controller (DEBUG)
GS $D09C - Last byte low-level SD controller read from card (DEBUG)
GS $D09D - FDC-side buffer pointer high bit (DEBUG)
GS $D09E - CPU-side buffer pointer low bits (DEBUG)
GS $D09F.0 - CPU-side buffer pointer high bit (DEBUG)
GS $D09F.1 - EQ flag (DEBUG)
GS $D09F.2 - EQ flag inhibit state (DEBUG)
C65 $D0A0   C65 RAM Expansion controller
C65 $D0A0-$D0FF - Reserved for C65 RAM Expansion Controller.
C65 $D0A0-$D0FF SUMMARY:REC Reserved for C65 RAM Expansion Controller.
GS $D0E0.0-3 Select active UART for other registers
GS $D0E0.4 Enable loopback mode
GS $D0E0.5 Buffered UART master RX buffer high-water IRQ enable
GS $D0E0.6 Buffered UART master TX queue low-water IRQ enable
GS $D0E0.7 Buffered UART master IRQ enable
GS $D0E1 Buffered UART Status register / interrupt select register
GS $D0E1.0 Buffered UART enable interrupt on TX buffer low-water mark
GS $D0E1.1 Buffered UART enable interrupt on RX high-water mark
GS $D0E1.2 Buffered UART enable interrupt on RX byte
GS $D0E1.3 Buffered UART TX buffer full
GS $D0E1.4 Buffered UART RX buffer full
GS $D0E1.5 Buffered UART TX buffer empty
GS $D0E1.6 Buffered UART RX buffer empty
GS $D0E1.7 Buffered UART interrupt status
GS $D0E2 Buffered UART Read register (write to ACK receipt of byte)
GS $D0E3 Buffered UART Write register (write to send byte)
GS $D0E4 Buffered UART bit rate divisor LSB
GS $D0E5 Buffered UART bit rate divisor middle byte
GS $D0E6 Buffered UART bit rate divisor MSB
markkrueg commented 7 months ago

I am seeing this same corrupt image on a fresh install of development core 73 in my slot 6 with a freshly built SD card from same core. Just booted up after going through initial setup and went into FREEZER. corrupt_freezer_image

johnwayner commented 7 months ago

@markkrueg The thumbnail frame corruption is a different issue, I believe.

FYI, I can no longer reproduce the freezer exit crash issue. It would crash 100% of the time when I was working on a fix, but now it works every time. I've removed the HICKUP.M65 and it's also not reported in MEGAINFO. I've done total SD Card wipes using the built-in tool. I've even tried applying a HICKUP that should force the issue to happen by writing $FF to $D081, and it's fine.

I don't get it. it was all very predictable before. I guess I was lucky to be in whatever state was causing it. Maybe the contents of the sd card have to be just right?

johnwayner commented 7 months ago

I found the issue with the thumbnail frames being corrupted: https://github.com/MEGA65/mega65-freezemenu/pull/72

lydon42 commented 7 months ago

I will add the freezer version to r5-bringup

lydon42 commented 6 months ago

fix has been merged