ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
1.02k stars 108 forks source link

ELKS on Book 8088 #1619

Closed Vutshi closed 1 year ago

Vutshi commented 1 year ago

Description I have a new and shiny laptop based on the 8088 processor. You can read about the Book 8088 on arstechnica. It's a fascinating device that allows for easy CPU changes. I've already tried the V20, Intel 8088, and a Soviet clone of the 8088, and they all work well in DOS. However, I'm currently having trouble getting it to boot ELKS out of the box.

UPDATE: About hardware details see the blogpost in the comment https://github.com/ghaerr/elks/issues/1619#issuecomment-1720048680

UPDATE2: More hardware details from Sergey Kiselev:

The schematic of Book8088 is quite interesting. In addition to the 8088 CPU and the 8087 FPU, it uses an original 8284 clock generator, 8288 bus controller, 8237 DMA controller, 8259 PIC, and 8253 PIT It looks that they've got most of the discrete logic functionality for an XT motherboard into a CPLD, also the CGA controller functionality is implemented as two CPLDs and the original 6845 CRT controller. According to the pinout, the CPLDs seem to be Altera MAX7000S series or Atmel ATF1500AS series in PLCC84 package. They also use some kind of microcontroller to implement the XT keyboard, that is interfaced directly to the XT motherboard logic CPLD The system uses two EPROMs (why not flash ROMs?!) - one 27C512 for the BIOS, and one 27C256 for CGA font The system memory uses one 512KB SRAM and one 128 KB SRAM chip. CGA uses a 32 KB SRAM for the video RAM

Another technical detail, the data bus of all 82xx controllers, the XT logic CPLD, and the memory are connected directly to the CPU data transceiver. This probably will reduce reliability when using external ISA cards. Typically, these things sit behind an additional transceiver...

Configuration

Additional information

$ sudo ./a.out /dev/sdd Opening drive /dev/sdd.. MBR magic bytes in place! Analyzing Partition 0 This is bootable Partition is 63504 sectors (31.01MB) Starting CHS values C=1, H=1, S=0 Ending CHS values C=16, H=63, S=63 Partition starts at sector number 63 (31.00KB in) Partition filesystem id is 128 Analyzing Partition 1 This is not bootable Is an empty partition Analyzing Partition 2 This is not bootable Is an empty partition Analyzing Partition 3 This is not bootable Is an empty partition

ghaerr commented 1 year ago

@Vutshi:

Specifically, the cursor is not visible within the kilo editor, and it fails to appear in the terminal after exiting kilo.

Some code specific to IBM PC-compatible hardware was added to the Direct Console for controlling whether the cursor is on or off:

static void DisplayCursor(int onoff)
{
    /* unfortunately, the cursor start/end at BDA 0x0460 can't be relied on! */
    unsigned int v = onoff? 0x0d0e: 0x2000;

    outb(10, CCBase);
    outb(v >> 8, CCBase + 1);
    outb(11, CCBase);
    outb(v, CCBase + 1);
}

CCBase is the address of the 6845 CRT Controller chip. As the comment implies, the BIOS doesn't keep accurate data on the exact scanlines for cursor top and bottom. I would guess perhaps the Book8088 is using original EGA and needs 0x0607 rather than 0x0d0e above. Perhaps change that in elks/arch/i86/drivers/char/console-direct.c and see what happens.

Thank you!

Vutshi commented 1 year ago

Hi @ghaerr

CCBase is the address of the 6845 CRT Controller chip. As the comment implies, the BIOS doesn't keep accurate data on the exact scanlines for cursor top and bottom. I would guess perhaps the Book8088 is using original EGA and needs 0x0607 rather than 0x0d0e above. Perhaps change that in elks/arch/i86/drivers/char/console-direct.c and see what happens.

The Book8088 features CGA as it aims to replicate the original XT (although I've heard about potential development of a new model with VGA). The BIOS seems to do standard XT things: https://github.com/skiselev/8088_bios/blob/e85a7c2647f2351e6dfc1e0053c8164f016cbbd4/src/video.inc#L303

So the DisplayCursor function seems quite innocuous, following the guidelines provided by osdev.org for the most part.

Best

Vutshi commented 1 year ago

@ghaerr

Is there a simple way to test the escape sequences in ELKS? In macOS terminal I can just say echo '\e[?25h' and echo '\e[?25l' to turn on/off the cursor. It doesn't work in ELKS terminal though.

ghaerr commented 1 year ago

Is there a simple way to test the escape sequences in ELKS? It doesn't work in ELKS terminal though.

Yes - but you need to use /bin/sash not /bin/sh (ash). By default ash interprets arrow keys which screws up what you're trying to do. Login as toor which starts /bin/sash, then just type the characters followed by return, and they will be echoed to turn the cursor on and off as you desire. I think you'll find the cursor ON sequence doesn't work on Book 8088.

So the DisplayCursor function seems quite innocuous

Now that I know the Book is running CGA, I'm 95% certain the problem is that the cursor on code tries set the cursor to start at line 13 (0x0d) to line 14 (0x03) - the limit on CGA is 7, which is why I recommend using 0x0607 above as the fix. What I'm not sure of is how to fix this permanently, as ELKS doesn't know the type of adaptor attached.

Vutshi commented 1 year ago

@ghaerr

the limit on CGA is 7, which is why I recommend using 0x0607 above as the fix.

I don’t understand where does this limit come from. I think CGA supports 80x25 text mode.

ghaerr commented 1 year ago

I don’t understand where does this limit come from.

I am not an expert on all the CGA or EGA modes, but the two limits I'm referring to have to do with how many pixels high the characters are. On 8x8 chars, the 0x0607 limit applies. On EGA/VGA with 8x14 characters, 0x0d0e applies. The two nibbles are the start and stop scan line for the cursor definition, sent to the CRTC.

Vutshi commented 1 year ago

Finally it makes sense to me! Thank you. I now see that these cursor lines pertain to the cursor’s image size, which is 8x8 for CGA or 9x14 for MDA. Initially I though it is related to the cursor’s position on the screen )

Vutshi commented 1 year ago

@ghaerr

What I'm not sure of is how to fix this permanently, as ELKS doesn't know the type of adaptor attached.

However after boot up the cursor always looks fine. I assume it is initially set up by BIOS. Can ELKS read out the correct cursor parameters after boot?

Btw, a strange thing happened to the cursor in our Schneider EuroPC which has MDA. The cursor became big in kilo: MDA cursor

Vutshi commented 1 year ago

Apparently the big cursor in MDA is a wraparound feature. I found a comprehensive guide to MDA, CGA, EGA cursors https://www.pcjs.org/blog/2018/03/20/

Here are IBM’s default values for the Cursor Start and Cursor End registers, expressed as ranges: MDA: 11-12 CGA: 6-7 EGA: 11-13 (assuming either a Monochrome or Enhanced Color Display)

ghaerr commented 1 year ago

However after boot up the cursor always looks fine. I assume it is initially set up by BIOS.

Yes.

Can ELKS read out the correct cursor parameters after boot?

No, that's the whole problem. The 6845 CRTC cursor start and end registers are write-only! Do you now see the problem? ELKS doesn't know what the cursor start and end should be for the monitor, after turning off the cursor.

Apparently the big cursor in MDA is a wraparound feature.

I see - so another issue with my DisplayCursor function. It doesn't work on CGA or MDA. I'm thinking we may need (another) /bootopts setting for this - "mda" or "cga" which set the cursor scan lines to 0x0607. I'll read the link you posted.

ghaerr commented 1 year ago

We could add a bunch of code to determine the monitor type to ELKS, but I'm trying to REDUCE the number of hardware dependencies, rather than increase. Perhaps instead of a /bootopts option for cursor start/end, we should have an option for turning on the "hardware" cursor on/off in the first place, since it normally doesn't really matter, except for programs that want no cursor at all!!

Vutshi commented 1 year ago

Do you now see the problem?

The Pre-plug&play world is dark and full of terrors :)

ghaerr commented 1 year ago

I found a comprehensive guide to MDA, CGA, EGA cursors The Pre-plug&play world is dark and full of terrors :)

Holy heck! Between the excellent article and pre-plug and play hardware, there's just all too much that can happen trying to turn a simple cursor back "on".

Since CGA, MDA and EGA/VGA all require different start/end cursor line numbers, what do you think about a cursor=0607 addition to /bootopts to solve this problem? (MDA users would use cursor=0c0d or something like that). The same could also be used to set a block cursor as default if desired with cursor=000e.

Any news on the CF card saga?

Vutshi commented 1 year ago

what do you think about a cursor=0607 addition to /bootopts to solve this problem? (MDA users would use cursor=0c0d or something like that).

Yes, I think it would be a good solution.

I also like this suggestion:

Perhaps instead of a /bootopts option for cursor start/end, we should have an option for turning on the "hardware" cursor on/off in the first place, since it normally doesn't really matter, except for programs that want no cursor at all!!

I'm curious about how DOS handled graphics card-dependent cursors.

Vutshi commented 1 year ago

Any news on the CF card saga?

Currently, it's on hold since the operator of these computers is occupied with something else.

ghaerr commented 1 year ago

I'm curious about how DOS handled graphics card-dependent cursors.

That's a good question, because the BDA (BIOS Data Area) has two bytes dedicated the to start and ending cursor lines. The problem is, even for EGA/VGA screens, the values put in these locations (including QEMU) seem to always be 6 and 7!

I wasn't aware DOS actually changed the cursor size in any of its standard applications.

Vutshi commented 1 year ago

@ghaerr

This is how Sergey's BIOS identify a display adapter: https://github.com/skiselev/8088_bios/blob/e85a7c2647f2351e6dfc1e0053c8164f016cbbd4/src/bios.asm#L527

Maybe ELKS can just do the same?

ghaerr commented 1 year ago

This is how Sergey's BIOS identify a display adapter:

Wow, very interesting, using the BDA equipment byte! I had seen this before, but with the initial video mode limited to two bits, I didn't realize the packing that allows initial video modes 1 (CGA), 3 (EGA/VGA) and 7 (MDA) to be specified!

Maybe ELKS can just do the same?

I think so! Thank you for finding this :) I'll post a PR with the changes to the direct console and it sounds like this should work for all three display types.

ghaerr commented 1 year ago

@Vutshi, PR #1679 is posted that will hopefully fix this. It got a bit more complicated because the BDA equipment byte only differs between CGA and EGA/VGA if running in 40x25 mode, which is rarely used. I then noticed an extended BDA parameter area reserved for EGA/VGA BIOSes to differentiate between CGA and EGA/VGA. This should work on Book 8088 but may not with an EGA monitor on very old BIOSes.

Vutshi commented 1 year ago

@ghaerr , Thank you for the quick fix. We will test it on our hardware asap. Compilation in my macOS environment is now extremely smooth, thanks again for all the improvements. The concluding message Build script has terminated successfully was gratifying to observe. The only remaining desire is to replace terminated with completed for an extra dose of satisfaction. :)

ghaerr commented 1 year ago

The concluding message Build script has terminated successfully was gratifying to observe.

Yes, me too, it's nice not to end the build with an error message!

The only remaining desire is to replace terminated with completed for an extra dose of satisfaction. :)

OK - why not?! I'll add that change to the next small cleanup PR I'm preparing.

Vutshi commented 1 year ago

@ghaerr We have verified that the cursor works well on both Book8088 (CGA) and Schneider (MDA).

CF card saga is still under investigation.

Vutshi commented 1 year ago

@ghaerr A completely unrelated topic. We tried the fun train program sl on our hardware. Can you imagine how long it takes for the train to reach its destination on the 8088 CPU? Book8088 is relatively fast: Sl timing Book8088 small However, the poor Schneider is not so lucky. According to time it takes over 8 thousand years!!! Sl timing Schneider small We felt the timing was slightly off, so we took another approach: Sl date Schneider small and got 27 minutes. This is kind of amazing:)

ghaerr commented 1 year ago

Hello @Vutshi,

Well, you're just "riding down the tracks" with sl finding bugs, huh?! Not sure yet exactly how time can come up with that large number, but I'd guess it looks like its wrapping negative and displaying as unsigned, somehow...

On a serious note, how can sl take 27 or 34 minutes to execute? Wow! Does it literally just crawl on the screen? I'll have to check the source and see why/how that could be!

Vutshi commented 1 year ago

Does it literally just crawl on the screen?

Yes, very sluggish.

ghaerr commented 1 year ago

Yes, very sluggish.

That's very interesting - as there's a call to usleep(40000) in sl.c. I wonder if there is something about that call that's causing sl to sleep much longer... that would also perhaps answer why time didn't work - I'm thinking perhaps the clock tick is happening at a different rate on those systems? Or something else?

Here's the usleep C library code:

int
usleep(unsigned long useconds)
{
        struct timeval timeout;

        timeout.tv_sec  = useconds / 1000000L;
        timeout.tv_usec = useconds % 1000000L;
        return select(1, NULL, NULL, NULL, &timeout);
}

I am wondering whether you might try commenting that out in elkscmd/tui/sl.c to see whether that makes a difference. I bet that it does!

And here's the times() library code, which also uses struct timeval:

clock_t times(struct tms *tp)
{
    struct timeval tv;

    if (gettimeofday(&tv, (void *)0) < 0)
        return -1;

    if (tp) {
        /* scale down to one hour period to fit in long*/
        unsigned long usecs = (tv.tv_sec % 3600) * 1000000L + tv.tv_usec;

        /* return user and system same since ELKS doesn't implement*/
        tp->tms_utime = usecs;
        tp->tms_stime = usecs;
        tp->tms_cutime = usecs;
        tp->tms_cstime = usecs;
    }

    return tv.tv_sec;
}

I'm suspicious...

Vutshi commented 1 year ago

Hi @ghaerr

I am wondering whether you might try commenting that out in elkscmd/tui/sl.c to see whether that makes a difference. I bet that it does!

Well, I did this and the results are strange. In QEMU, the train flies like a rocket, finishing in 2sec. Book8088 has improved from 34m to 22m: Sl timing Book8088 v2 Schneider seems uninterested in removing the sleep step; it appear to be occupied with an entirely different task: Sl date Schneider v2 Sl timing Schneider v2

ghaerr commented 1 year ago

Book8088 has improved from 34m to 22m:

That's a huge change for eliminating a timeout handled by the kernel, but I'm still confounded as to exactly what else might be happening. Looking at sl.c shows no other obvious delays and no floating point.

I'm wondering whether just writing to the display might be taking forever: how fast is a ls -lR /? Does the system seem to otherwise run at somewhat normal (slow-ish) speed?

What about matrix? Does that run super slowly?

Schneider seems uninterested in removing the sleep step

Do you have info on the hardware timer interrupt frequency on either system? It would seem that Schneider has problems related to that, as time gets its info from the kernel gettimeofday call. Is date fairly accurate? I'm wondering whether just getting the date/time is taking a while.

Vutshi commented 1 year ago

Hi @ghaerr The CF card saga is finally over. We tested a card from a different brand, and all sync and reboot bugs have disappeared. Apparently, it is a bug in XTIDE BIOS which is somehow specific to our Transcend CF220I and Minix ELKS (FAT ELKS works well). Interestingly, the older version of XTIDE BIOS used on Schneider is more prone to bugs, as it sometimes reports bioshd(128) retry attempts during boot-up. The latest XTIDE BIOS in Book8088 performs better and only kills the card with sync. I hope one day they will fix it completely.

ghaerr commented 1 year ago

@Vutshi, thanks for the final CF report. It's nice to know that ELKS works and the problem is in the BIOS or CF card, for a change. It's interesting to know its turtles all the way down, with regards to all the code and complexity likely required for flash devices and XTIDE to work properly.

If you want to continue debugging the time-related issues recently found with sl and time, lets keep this open, and please give me more information as described above when you have time.

Vutshi commented 1 year ago

Let's get back to the critical matter of train speed in ELKS :)

I'm wondering whether just writing to the display might be taking forever: how fast is a ls -lR /? Does the system seem to otherwise run at somewhat normal (slow-ish) speed?

ls runs ok on both of the computers: LS timing Book LS timing Book8088 is slightly faster probably because of V20 processor.

What about matrix? Does that run super slowly?

It is also slow, maybe not as slow as train but it is hard to measure.

Here is the train video. Each line appears to update rapidly, but there seems to be some processing at the end of each line which takes a lot of time.

Click to expand the video https://github.com/ghaerr/elks/assets/4971779/8ded4d85-9b88-4d95-a77a-7018a73bf4ca
ghaerr commented 1 year ago

Let's get back to the critical matter of train speed in ELKS :)

Yes, locomotion is important.

ls runs ok on both of the computers

Holy heck, over 1 minute just for an ls -lR /??? Wow. I'm going to have to try that using the "realistic floppy delay" code I just added to ELKS for more accurate QEMU emulation yesterday... and see how long that takes. Unfortunately, that won't take into account any difference between my very speedy MacBook and the 8088 CPU.

Here is the train video. Each line appears to update rapidly, but there seems to be some processing at the end of each line which takes a lot of time.

Hmmm... that is kind of strange with the trailing characters. Other than checking to see who's manning the caboose, I would think perhaps sl is possibly confused about how wide the screen is, and that there might be some CPU wait states involved when writing to the CGA RAM. I can see the characters being drawn separately across the screen... is each line of the ls -l output that slow too?

ghaerr commented 1 year ago

@Vutshi:

Each line appears to update rapidly, but there seems to be some processing at the end of each line which takes a lot of time.

That's a keen observation... thinking about that for a while:

I found another spot where this could be happening: my curses emulation. It is possible for some reason that sl thinks the screen width is much larger than it is.

Does it take quite a while for the train to start appearing on the screen, with just characters flickering on the right side of the monitor until the train "arrives"?

Change the following:

diff --git a/elkscmd/tui/curses.c b/elkscmd/tui/curses.c
index f03db8be..76d622e7 100644
--- a/elkscmd/tui/curses.c
+++ b/elkscmd/tui/curses.c
@@ -22,7 +22,7 @@ void *stdscr;
 void *initscr()
 {
     tty_init(MouseTracking|CatchISig|FullBuffer);
-    tty_getsize(&COLS, &LINES);
+    //tty_getsize(&COLS, &LINES);
     return stdout;
 }

Then rm elkscmd/tui/sl.o then make. I am going to hope this fixes the problem!

ghaerr commented 1 year ago

@Vutshi,

I'm closing in on the reason your train's not running on time: long story short, there's a complicated sequence that occurs for the tui programs (like sl, matrix, fm, etc) to work on both the ELKS console as well as a serial TTY to a terminal emulator. The tui library code uses an ANSI DSR (device status report) sequence to interrogate the console/terminal as to what the lines and columns are. This uses a routine that ends up being dependent on kernel timing/timeouts. I'm pretty certain the problem is that both your systems have issues with the kernel keeping track of internal "ticks" at 1/100 (HZ) second, which likely causes the ANSI sequence reading routine readansi to fail.

Should the timing be off, the DSR return sequence could be possibly be incorrect, and the resulting COLS/LINES be zero or an incorrect number. The sl program draws the train at each screen column from back to front - and doesn't check if the column number is very large or small. That's what I think is happening. When the Direct Console receives a cursor sequence that is out of range, it just moves to the max column, which is what we're seeing.

To check this, please uncomment the following lines in elkscmd/tui/unikey.c (as well as the above fix):

diff --git a/elkscmd/tui/unikey.c b/elkscmd/tui/unikey.c
index d1d1873a..a661898a 100644
--- a/elkscmd/tui/unikey.c
+++ b/elkscmd/tui/unikey.c
@@ -701,7 +701,7 @@ int ansi_dsr(char *buf, int n, int *cols, int *rows)
     p = buf + 2;
     *rows = getparm(p, 0);
     *cols = getparm(p, 1);
-    //printf("DSR terminal size is %dx%d\r\n", *cols, *rows);
+    printf("DSR terminal size is %dx%d\r\n", *cols, *rows);
     return 1;
 }

This will display at the bottom of the screen when you run sl and let us know what's happening.

To see the actual DSR received sequence, one would have to add the following lines before the return 1:

    for (int i = 0; i< n; i++) printf("[%02x]", buf[i]);

Should this be the problem, we'd need to look further into how often the kernel gets interrupted with the hardware timer interrupt - and whether that chip is PC compatible.

ghaerr commented 1 year ago

Also try running matrix and see what that looks like.

Vutshi commented 1 year ago

@ghaerr I've implemented all of the changes you suggested, but oddly, in QEMU, I don't see any messages displayed at the bottom of the screen. Just in case, I did ./clean.sh before ./build.sh. We will do hardware tests probably tomorrow.

BTW, whenever I do ./clean.sh it finishes with an error:

../bin/setboot
/Library/Developer/CommandLineTools/usr/bin/make -C bootblocks clean
Makefile:2: /Users/x2241064/Install/8088/ELKS/elks/.config: No such file or directory
make[1]: *** No rule to make target `/Users/x2241064/Install/8088/ELKS/elks/.config'.  Stop.
make: *** [clean] Error 2
ghaerr commented 1 year ago

I've implemented all of the changes you suggested, but oddly, in QEMU, I don't see any messages displayed at the bottom of the screen.

You mean you uncommented //printf("DSR terminal size is %dx%d\r\n", *cols, *rows); and it's not displaying?

I remembered I wrote another test command ttyinfo. Run that and take a screenshot.

Just in case, I did ./clean.sh before ./build.sh.

Don't use either of those, ever. Use make clean then make. It looks like clean.sh removed your .config file - you'll have to start again running make menuconfig. If you're ever just rebuilding the kernel, you can use make kclean. For the above recommended change, only a make was needed.

I think the clean.sh was intend to completely clean the ELKS directory. I suppose it should be deleted.

Instead of running ./build.sh, you can set the environment once using . env.sh then make, etc. I have that aliased to: alias ee='cd ~greg/net/elks-gh; . env.sh'.

In QEMU, I don't see any messages displayed at the bottom of the screen.

Take a screenshot if it still doesn't display after rebuilding.

Vutshi commented 1 year ago

Hi @ghaerr

hardware test on book8088 doesn’t show speed improvement and reports terminal size 25x80: SL new small

Here is video from book8088. There is still something going on at the end of each line.

Click to expand the video https://github.com/ghaerr/elks/assets/4971779/f3e698dd-4dc8-4f59-94a8-019d7a7ac199
ghaerr commented 1 year ago

Here is video from book8088. There is still something going on at the end of each line.

Is this with the latest source (from a day or two ago max), or with the tty_getsize commented out?

Definitely something is strange. If you're running the latest, run ttyinfo and see what it says, it should display lines and columns. Exit with ^D.

ghaerr commented 1 year ago

hardware test on book8088 doesn’t show speed improvement and reports terminal size 25x80:

Oops - missed that. No need for other test. Very strange!!! I'm kind of a loss for ideas. Can you try running fm or matrix and see what happens?

Vutshi commented 1 year ago

Unfortunately, this was the last hardware test before vacation. Hardware will be available again in a couple of weeks.

ghaerr commented 1 year ago

It seems rather strange that both Book 8088 and Schneider have the same bug... is the same boot CF working on any other systems, or are they all bad? Do you have specs on the Schneider system, is that a V20? What else is different about it from IBM PC?

Vutshi commented 1 year ago

Can you try running fm or matrix and see what happens?

Actually in QEMU these two programs didn’t show me the terminal size message.

ghaerr commented 1 year ago

Actually in QEMU these two programs didn’t show me the terminal size message.

That's likely because both may do a full screen erase before drawing. I just wanted to see whether they worked or not on non-QEMU.

ghaerr commented 1 year ago

Can you post the .config file you're using for the build? I'll take a look at that.

Vutshi commented 1 year ago

Matrix certainly worked on both machines although slow. Maybe not as slow as the train.

ghaerr commented 1 year ago

Do both machines use an IBM compatible PIT chip? I'm trying to determine whether I need to think about different PIT/PIC and associated timing or not.

ghaerr commented 1 year ago

@toncho11 have you ever run sl on your original IBM 5150? Perhaps there is something that doesn't show up except on very slow machines... ? sl isn't on the 360k disk though.

Vutshi commented 1 year ago

Do both machines use an IBM compatible PIT chip? I'm trying to determine whether I need to think about different PIT/PIC and associated timing or not.

I have found service manual for the Schneider https://oldcrap.org/wp-content/uploads/2023/04/schneider-pc-europc-service-manual.pdf

ghaerr commented 1 year ago

I'll look at the manual, thanks.

The funny thing is that sl (and fm) are supposed to turn off the cursor at startup. But it seems your video is showing that the cursor remains on... And you tested that the cursor turned off using kilo, and we fixed that. So that is quite strange.