EtchedPixels / FUZIX

FuzixOS: Because Small Is Beautiful
Other
2.18k stars 274 forks source link

Fuzix 0.3 fails with "sd: failed, retrying." on the real ZX Spectrum +3e with ZXMMC interface #766

Closed pawosm-arm closed 2 years ago

pawosm-arm commented 4 years ago

As in the topic. I could not complete init sequence due to the failure as in the screenshot: fuzix-zxmmc-failure After this failure, ZXMMC device remains stuck even after reset: power cycle is always required to make it work again. It is suspected that the failure occurs on the first write attempt.

EtchedPixels commented 4 years ago

Just some diagnosis notes. The non recovery until power cycle apears to be a bug in the +3e firmware as it doesn't know how to get out of a disk that is waiting for bytes for an I/O rather than idle or waiting to be put into SPI mode.

That in turn means that whatever happened left the device thinking there was more data due to appear on the write.

pawosm-arm commented 4 years ago

I did some investigation. First, I obtained reference Z80 assembly code for init, sector read and sector write from the ZXMMC manufacturer. Having this, I wrote somehow oversimplified ZXMMC driver for CP/M Plus (see here: https://sourceforge.net/projects/zxmmcfid ). It proved reference code was good for read/write operations and CP/M never stuck. Then I prepared a list of differences and examined them one by one. Some of the things were done better in Fuzix (e.g. I don't wait for 0xff before sector write, I guess it may cause problem on an overclocked machine, but, hey, I prepared this for the real thing!), yet some things were 'less paranoid' on Fuzix (e.g. I'm doing longer waits and sending more clocks). Eventually I came up with this very last thing that finally made Fuzix work on the real Speccy with the real ZXMMC! Turned out, things that are happening after sector read/write should not be ignored. Here's what I changed to make it work (feel free to use it to make a proper fix):

diff --git a/Kernel/dev/devsd.c b/Kernel/dev/devsd.c
index 4581295b..5a9b17e3 100644
--- a/Kernel/dev/devsd.c
+++ b/Kernel/dev/devsd.c
@@ -50,6 +50,8 @@ uint_fast8_t devsd_transfer_sector(void)
                     sd_spi_transmit_byte(0xFF); /* dummy CRC */
                     sd_spi_transmit_byte(0xFF);
                     success = ((sd_spi_wait(false) & 0x1F) == 0x05);
+                    if (success)
+                      while (sd_spi_wait(false) == 0x00) { }
                 }
             }
        }else
diff --git a/Kernel/dev/zx/zxmmc.c b/Kernel/dev/zx/zxmmc.c
index d9d3756c..219793ce 100644
--- a/Kernel/dev/zx/zxmmc.c
+++ b/Kernel/dev/zx/zxmmc.c
@@ -66,6 +66,10 @@ doread:
     inir
     ld a,(_vtborder)
     out (0xfe),a
+    in a, (0x3f)
+    nop
+    nop
+    in a, (0x3f)
     pop af
     or a
     jp nz,map_kernel

Also note that the two bytes input'ed after sector read are zeros in the FUSE emulator and they are CRC16 on the real thing (despite CRC checking being turned off on ZXMMC), so my CP/M driver when encounters non-zero value here, computes and compares CRC16 for the obtained sector. fuzix-works

One annoying thing I've found. I closed the system with the shutdown command. Apparently, it does not unmount rootfs. As a result, on subsequent boot fsck is performed. It's time consuming on Speccy and it asks way too many questions. Also, it warns about sd timeout several times, which is particularly worrying.

pawosm-arm commented 4 years ago

I've made my changes on git SHA 3168f915ca851e6331d41f87adbc26d6ecfb780c (merge commit from Dec 26 2019). Yet as I've built the most recent master, init panicked right after start with the rootfs I had on my MMC card. Is it expected or is it an regression?

pawosm-arm commented 4 years ago

Nah, those extra clocks are extremely needed. Without them Fuzix hangs after a while (not that spectacularly as before and plain reset helps, yet still). Additional necessary change needed is following:

diff --git a/Kernel/dev/devsd.c b/Kernel/dev/devsd.c
index 4581295b..a41f5717 100644
--- a/Kernel/dev/devsd.c
+++ b/Kernel/dev/devsd.c
@@ -71,6 +73,9 @@ void sd_spi_release(void)
 {
     sd_spi_raise_cs();
     sd_spi_receive_byte();
+    sd_spi_receive_byte();
+    sd_spi_receive_byte();
+    sd_spi_receive_byte();
 }

 uint8_t sd_spi_wait(bool want_ff)
pawosm-arm commented 4 years ago

Sadly, it still hangs randomly, even with the change above :( (but... if it doesn't block the device, maybe it's not the I/O issue and the last change above isn't really needed? In the end, it does not hang on time consuming fsck, neither with or without the change above; from the other side, when it's hung, the diode of the ZXMMC device is lit, I suspect it's hung on the success branch while loop from my first patch)

pawosm-arm commented 4 years ago

So, I turned this potentially infinite loop into the 32767 iterations. The problem with blocked interface returned. I assume, when "sd: failed, retrying.\n" is printed and the SD command is re-sent, the device gets stuck.

feilipu commented 4 years ago

Just to shout from the sidelines here; some random thoughts on SD cards, and initialisation issues.

pawosm-arm commented 4 years ago

Now I got something that works for a long while. I guess it helped for good. I've removed use of the timer and did waiting the same way it's done in my CP/M driver. Complete change looks like this:

diff --git a/Kernel/dev/devsd.c b/Kernel/dev/devsd.c
index 4581295b..216baf63 100644
--- a/Kernel/dev/devsd.c
+++ b/Kernel/dev/devsd.c
@@ -50,6 +50,8 @@ uint_fast8_t devsd_transfer_sector(void)
                     sd_spi_transmit_byte(0xFF); /* dummy CRC */
                     sd_spi_transmit_byte(0xFF);
                     success = ((sd_spi_wait(false) & 0x1F) == 0x05);
+                    if (success)
+                      while (sd_spi_wait(false) == 0x00) { }
                 }
             }
        }else
@@ -75,12 +77,10 @@ void sd_spi_release(void)

 uint8_t sd_spi_wait(bool want_ff)
 {
-    unsigned int timer;
+    int repeats = 12800;
     uint_fast8_t b;

-    timer = set_timer_ms(500);
-
-    while(true){
+    while(--repeats){
         b = sd_spi_receive_byte();
         if(want_ff){
             if(b == 0xFF)
@@ -89,10 +89,6 @@ uint8_t sd_spi_wait(bool want_ff)
             if(b != 0xFF)
                 break;
         }
-        if(timer_expired(timer)){
-            kputs("sd: timeout\n");
-            break;
-        }
     }

     return b;
diff --git a/Kernel/dev/zx/zxmmc.c b/Kernel/dev/zx/zxmmc.c
index d9d3756c..219793ce 100644
--- a/Kernel/dev/zx/zxmmc.c
+++ b/Kernel/dev/zx/zxmmc.c
@@ -66,6 +66,10 @@ doread:
     inir
     ld a,(_vtborder)
     out (0xfe),a
+    in a, (0x3f)
+    nop
+    nop
+    in a, (0x3f)
     pop af
     or a
     jp nz,map_kernel
EtchedPixels commented 4 years ago

The first change looks right

The second one is interesting. I need to dig into that because it actually implies either your card takes > 0.5 seconds to complete (which I doubt), or something is messed up in the timer/timeout handling. Definitely need to work out what is up.

The in nop nop in implies the hardware needs 16 clocks between accesses ? (since inir works and presumably in nop in wasn't good enough)

shutdown does btw mark the file system as clean so if that is still failing something is still not getting correctly to the media, or you built a current kernel and not matching user space. Right now that's a mess because I am in the middle of changing the binary format and it will make some stuff explode like signal handling. Fairly soon I'll push the rest of the new format and then old binaries won't work at all so all will be sane again 8). Something I need to do because the original design has some bad mistakes in it and I need to fix them to get things like 8080 binaries working on 8080 or Z80 kernels and to fix up other braindead API issues like relying on RST 30 for a syscall which means you can't run Fuzix when you've got a fixed ROM in the low 16K like on the Pentagon.

pawosm-arm commented 4 years ago

The two nops between ins I've taken from the reference code, it's full of pair of nops in places like that (e.g. between two inirs or two otirs). I assume, changing border colour between inirs or otirs that Fuzix does, should also do. The shutdown command indeed does what it should now (after applying the most recent patch). Yet I didn't check if halt or reboot are still problematic as facing fsck on an old machine is painfully slow (same issue I'm seeing with FreeDOS and chkdsk started on Amstrad PC 2086 to which I've installed XT-IDE 8-bit ISA card and a 128MB Compact Flash storage, checking filesystem reports time elapsed around 900sec!). Speaking of the rootfs. Is there a reason the TERM variable is not set? Some of the programs refuse to run because of that. Also, I felt disappointed that fforth failed to start (for a different reason), I like the language.

EtchedPixels commented 4 years ago

You can set TERM in your .profile or for a fixed console in /etc/inittab.

TERM=vt52; export TERM

(the modern export TERM=vt52 isn't supported by the classic Bourne shell)

If you change it from

getty /dev/tty1

to

getty /dev/tty1 115200 vt52

it will then set the terminal type before running your shell.

halt and reboot don't tidy up - they are implemented the traditional unix way as root only tools that do directly do what they say.

EtchedPixels commented 4 years ago

fforth isn't built for z80 or 8080 atm - @davidgiven does some C and preprocessor tricks that sdcc and ackcc can't cope with. Even if you preprocess it via gcc and the sdcc headers you end up with stuff sdcc can't compile.

pawosm-arm commented 4 years ago

You can set TERM in your .profile or for a fixed console in /etc/inittab.

Indeed, doing sed on the inittab file helped, TERM is set now. Could be a default. Not sure if 115200 should be a default, my full blown Linux systems have 38400 or even 9600 set...

Seems like this thread bloats into off-topics, hence my extended question. Is there a general space to note some of my 'user experiecne' questions? For example:

EtchedPixels commented 4 years ago

Not really - ok there's a fuzix-devel google group but it's gone a bit quiet

pawosm-arm commented 4 years ago

You don't. The cpm tools use raw disk access like cpmtools on Linux.

...but things like cpmls or cpmcp don't seem to exist on the rootfs...

chroot isn't there as a tool

I've found extern int chroot(const char *path); declared in the syscalls.h file, so I guess it's a part of the userspace runtime library.

EtchedPixels commented 4 years ago

I don't think the default file system creation includes them

chroot is both a system call and a command. The system call is supported, the chroot command isn't written.

pawosm-arm commented 4 years ago

I've built Fuzix with today's master and booted it using http://www.fuzix.org/downloads/0.4pre/rootfs-z80-32.gz root filesystem. As I can see, most of the changes mentioned here were commited into the code, yet one change is still needed to make it boot on the real Spectrum with ZXMMC storage:

--- a/Kernel/dev/devsd.c
+++ b/Kernel/dev/devsd.c
@@ -85,12 +85,10 @@ void sd_spi_release(void)

 uint8_t sd_spi_wait(bool want_ff)
 {
-    unsigned int timer;
+    int repeats = 12800;
     uint_fast8_t b;

-    timer = set_timer_ms(500);
-
-    while(true){
+    while(--repeats){
         b = sd_spi_receive_byte();
         if(want_ff){
             if(b == 0xFF)
@@ -99,10 +97,6 @@ uint8_t sd_spi_wait(bool want_ff)
             if(b != 0xFF)
                 break;
         }
-        if(timer_expired(timer)){
-            kputs("sd: timeout\n");
-            break;
-        }
     }

     return b;
EtchedPixels commented 3 years ago

The two new ports merged this week change some of the SD behaviour due to behaviour that seems very close to this so hopefully it's now resolved.

pawosm-arm commented 3 years ago

The good news is that now it is usable on the real thing without any patching. The bad news is that when login: prompt is displayed, the ZXMMC diode is on all of the time (the border isn't flashing at the same time) until I log in (as root). It seems worrying (what this device is doing with my MMC card at that time?!), even if everything is back to normal right after I log in and the system seems fully useable.

davidgiven commented 3 years ago

That could be because CS is still asserted --- with my changed logic it will remain asserted after a write until the next command. It should be harmless but um. It's still not desirable. At least it seems to be verifying that busy-waiting on the write completion is what was making the card unhappy.

Does the kernel have any asynchronous timer mechanism which will allow a driver callback to fired while user code is running?

EtchedPixels commented 3 years ago

Most of the boards I have wire the light to /CS so I would guess the ZXMMC did the same. It's good that it works so we now know what is up.

It's possible to add but I've tried to avoid putting more and more goo in the interrupt paths. Every platform with a timer interrupt will pass through process.c:timer_interrupt regularly when interrupts are on and through the core part of it (doing alarms, etc) 10 times a second.