Disk/floppy performance

ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086

Other

1.01k stars 108 forks source link

Disk/floppy performance #521

Closed Mellvik closed 4 years ago

Mellvik commented 4 years ago

Finally some disk performance metrics on physical HW.

HW: 286 12MHz, FD1,2M, HD type 17 42MB Conner, booting from Minix floppy

ls -l /bin [7s]
cp /bin/vi /bin/xx [28.5s]
ps [2s]
cat /etc/rc.d/rc.sys [4s]
cp 245kfile xx [HD to HD, FATfs] [24s] (DOS 3s, venix 13s on same disk, Unix V7 filesystem)

--Mellvik

Mellvik commented 4 years ago

Live performance: https://drive.google.com/file/d/1t49cysn2xT2QoqvYD-UL1WIamPNnYEDk/view?usp=drivesdk

ghaerr commented 4 years ago

That is quite slow... although seeing the screen does remind me of the speed of yesterday's computers.

After we get your latest round of basic bugs fixed, we can look seriously into ELKS I/O being quite slow compared to MSDOS And Venix. I suspect a lot has to do with ELKS only doing single sector I/O on all disks.

Mellvik commented 4 years ago

That is quite slow... although seeing the screen does remind me of the speed of yesterday's computers.

After we get your latest round of basic bugs fixed, we can look seriously into ELKS I/O being quite slow compared to MSDOS And Venix. I suspect a lot has to do with ELKS only doing single sector I/O on all disks.

—

Agree, and yes, what you see is the real thing of yesteryear. My 386/20 is more pleasant, but admittedly, coming to DOS and starting Elvis in one second is .... Then again, notice the dir listing slowing down. I believe this is unique to the FAT filesystem, but need to be more empirical on that. A reasonable ambition for elks would be to beat Venix.

-M

Mellvik commented 4 years ago

As one problem after the other are being fixed (fast), and new features in ELKS seem to come by the day, I'd like to keep this issue close to the top of the list. From my perspective - on physical systems, fd/hd performance is now the primary barrier between a 'really useable system' (1.0) and 'early development' (0.3).

ghaerr commented 4 years ago

From my perspective - on physical systems, fd/hd performance is now the primary barrier between a 'really useable system' (1.0) and 'early development' (0.3).

Agreed. Improving the speed will likely involve doing multi-sector I/O > 1024 bytes, which requires some consider buffer system revamping. IIRC the system always does single sector I/O on the last 1k read due to a BIOS int 13h bug on 720k floppies (#39).

Mellvik commented 4 years ago

@ghaerr,

I read up the exchanges on #39. Since the problem is on a rarely used format and possibly related to some specific hardware, it may make sense to have a workaround in a config option instead of the mainline code where it inhibits general progress.

—Mellvik

apr. 2020 kl. 05:48 skrev Gregory Haerr notifications@github.com:

From my perspective - on physical systems, fd/hd performance is now the primary barrier between a 'really useable system' (1.0) and 'early development' (0.3).

Agreed. Improving the speed will likely involve doing multi-sector I/O > 1024 bytes, which requires some consider buffer system revamping. IIRC the system always does single sector I/O on the last 1k read due to a BIOS int 13h bug on 720k floppies (#39 https://github.com/jbruchon/elks/issues/39).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/521#issuecomment-616934729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOC5NGI7IMSO4GWH5ELRNUJQDANCNFSM4MAMWJXA.

ghaerr commented 4 years ago

From discussion on separate issue https://github.com/jbruchon/elks/pull/812#issuecomment-716057465:

I haven't done the math, but 4 or 6 minutes to get 2880 sectors @ 360 rpm and 18x2 available per rotation is a hell of a lot of missed rotations. I honestly think we should be able to read (or write) a 1.44 floppy in less than 2 minutes.

I'm fairly certain the floppy performance problem is just that the BIOS is limited to reading 1-2 sectors max at a time. Then, when the next read request comes in, the floppy must rotate once to get the next sector, not multiple times. This can be tested by making the following change in the bioshd.c driver, which will show the CHS during disk I/O. When used with slow floppies, this should give us a clue as to actually how fast and which sectors are being read:

bioshd.c line 709:
        while (count > 0) {
            sector = (unsigned int) ((start % (sector_t)drivep->sectors) + 1);
            tmp = start / (sector_t)drivep->sectors;
            head = (unsigned int) (tmp % (sector_t)drivep->heads);
            cylinder = (unsigned int) (tmp / (sector_t)drivep->heads);
            this_pass = drivep->sectors - sector + 1;
            printk("CHS %d,%d,%d count %d, this %d\n", cylinder, head, sector, (int)count, this_pass);  <--- insert this

optimizing floppy io is probably not worth it. OTOH, if it contributes to general disk io speed, it is.

The BIOS is capable of reading from the requested sector to the end of track, but ELKS reduces the request to always fit in a 1k byte (1 block) disk buffer. It would not be a big deal to set aside a larger area of memory to use as a static buffer for the BIOS to read more sectors at once into, with minimal changes to the bioshd.c (floppy) driver. For instance, to buffer a typical floppy of CHS 80,2,18 (18 sectors/track), would only require 18 * 512 = 9k byte buffer.

If you'd like to do the initial testing described above, we could put together an enhancement that would, for floppies only, allocate a 9k buffer and not truncate the read requests. Most read requests from programs are currently truncated (example, dd unless bs= is given), but the kernel itself tries to read a program's code and data segment (separately) all at once, when initially read from disk to be executed, for instance. [EDIT: I just ran this test and found that count=2 for almost all requests, including the kernel exec() reads, which is because those reads still go through the buffer system. So this enhancement will require more work than previously thought.]

Thus, the important measurement above is to look at the sector # and sector count, which are displayed before truncation. This would give an approximation of how many remaining sectors would be typically read at once in an enhanced implementation.

Mellvik commented 4 years ago

I haven't done the math, but 4 or 6 minutes to get 2880 sectors @ 360 rpm and 18x2 available per rotation is a hell of a lot of missed rotations. I honestly think we should be able to read (or write) a 1.44 floppy in less than 2 minutes.

I'm fairly certain the floppy performance problem is just that the BIOS is limited to reading 1-2 sectors max at a time. Then, when the next read request comes in, the floppy must rotate once to get the next sector, not multiple times. This can be tested by making the following change in the bioshd.c driver, which will show the CHS during disk I/O. When used with slow floppies, this should give us a clue as to actually how fast and which sectors are being read:

bioshd.c line 709: while (count > 0) { sector = (unsigned int) ((start % (sector_t)drivep->sectors) + 1); tmp = start / (sector_t)drivep->sectors; head = (unsigned int) (tmp % (sector_t)drivep->heads); cylinder = (unsigned int) (tmp / (sector_t)drivep->heads); this_pass = drivep->sectors - sector + 1; printk("CHS %d,%d,%d count %d\n", cylinder, head, sector, this_pass); <--- insert this optimizing floppy io is probably not worth it. OTOH, if it contributes to general disk io speed, it is.

The BIOS is capable of reading from the requested sector to the end of track, but ELKS reduces the request to always fit in a 1k byte (1 block) disk buffer. It would not be a big deal to set aside a larger area of memory to use as a static buffer for the BIOS to read more sectors at once into, with minimal changes to the bioshd.c (floppy) driver. For instance, to buffer a typical floppy of CHS 80,2,18 (18 sectors/track), would only require 18 * 512 = 9k byte buffer.

If you'd like to do the initial testing described above, we could put together an enhancement that would, for floppies only, allocate a 9k buffer and not truncate the read requests. Most read requests from programs are currently truncated (example, dd unless bs= is given), but the kernel itself tries to read a program's code and data segment (separately) all at once, when initially read from disk to be executed, for instance.

Thus, the important measurement above is to look at the sector # and sector count, which are displayed before truncation. This would give an approximation of how many remaining sectors would be typically read at once in an enhanced implementation.

I'm in, @ghaerr, leaving the raw device access 'dream' :-) for now (eventually it would be an interesting experiment to see how fast the BIOS can handle a full (max) 64k read).

Doing the math (not as bad as my in head calculations from last night), we're currently getting an average 1,3 sectors per rev or 3,9kB/s. I figure the best we can reasonably get is a track every other rev, or 9sects/rev on average, 27k/s - an improvement factor of 7. If we can get above 10kBps, we have a very significant improvement.

Anyway, I'm seeing the output from the suggested printk as we speak (posted separately), mostly 2 sometimes 1 (I put the printk a few lines below what was suggested, see code snippet!). From entering 'root' at the login prompt until we get the shell prompt, we have 61 reads, 11 of which are single sector and always 2 consecutive sectors. Looking at the CHS values, the entire 'transaction' could be condensed to 7 long reads (most of them 18 sectors, some 9), plus 4 short (double sector) reads at the end.

I guess these numbers are entirely predictable: 'Count' in the main loop is initialized to 2 and ... if ((sector_t)this_pass > count) this_pass = (unsigned int) count;

—— while (count > 0) { sector = (unsigned int) ((start % (sector_t)drivep->sectors) + 1); tmp = start / (sector_t)drivep->sectors; head = (unsigned int) (tmp % (sector_t)drivep->heads); cylinder = (unsigned int) (tmp / (sector_t)drivep->heads); this_pass = drivep->sectors - sector + 1; / Fix for weird BIOS behavior with 720K floppy (issue #39) / if (this_pass < 3) this_pass = 1; / End of fix / if ((sector_t)this_pass > count) this_pass = (unsigned int) count; printk("CHS %d,%d,%d count %d\n", cylinder, head, sector, this_pass);

When reading full sectors (dd if=/dev/fd0), all reads are 2 sectors except the last two sectors on the track, which are single sect reads:

CHS 44,1,1 count 2 CHS 44,1,3 count 2 CHS 44,1,5 count 2 CHS 44,1,7 count 2 CHS 44,1,9 count 2 CHS 44,1,11 count 2 CHS 44,1,13 count 2 CHS 44,1,15 count 2 CHS 44,1,17 count 1 CHS 44,1,18 count 1 CHS 45,0,1 count 2 CHS 45,0,3 count 2 CHS 45,0,5 count 2 CHS 45,0,7 count 2 CHS 45,0,9 count 2 CHS 45,0,11 count 2 CHS 45,0,13 count 2 CHS 45,0,15 count 2 CHS 45,0,17 count 1 CHS 45,0,18 count 1

It becomes even more interesting when looking at the output from /dev/fd1, which is a 1.2M 5,25" drive. The pattern is the same, but the odd number of sector per track makes for 'odd' behaviour:

CHS 4,0,1 count 2 CHS 4,0,3 count 2 CHS 4,0,5 count 2 CHS 4,0,7 count 2 CHS 4,0,9 count 2 CHS 4,0,11 count 2 CHS 4,0,13 count 2 CHS 4,0,15 count 1 CHS 4,1,1 count 1 CHS 4,1,2 count 2 CHS 4,1,4 count 2 CHS 4,1,6 count 2 CHS 4,1,8 count 2 CHS 4,1,10 count 2 CHS 4,1,12 count 2 CHS 4,1,14 count 1 CHS 4,1,15 count 1 CHS 5,0,1 count 2 CHS 5,0,3 count 2 CHS 5,0,5 count 2 CHS 5,0,7 count 2 CHS 5,0,9 count 2 CHS 5,0,11 count 2 CHS 5,0,13 count 2 CHS 5,0,15 count 1 CHS 5,1,1 count 1 CHS 5,1,2 count 2

It turns out this is a side effect of the 720k fix @ line 716, which I figured could be alleviated by actually testing for the presence of a 720 drive before adjusting this_pass: / Fix for weird BIOS behavior with 720K floppy (issue #39) / if ((drivep->sectors == 9) && (drivep->cylinders == 80) && (this_pass < 3)) this_pass = 1; //if (this_pass < 3) this_pass = 1; This works fine - for the 1.4M drive, but not for the 1.2M drive, which admittedly changes behaviour but still not correct - and now reads 14 double sectors, then 2 single sectors.

CHS 2,0,13 count 2 CHS 2,0,15 count 1 CHS 2,1,1 count 1 CHS 2,1,2 count 2 CHS 2,1,4 count 2 CHS 2,1,6 count 2 CHS 2,1,8 count 2 CHS 2,1,10 count 2 CHS 2,1,12 count 2 CHS 2,1,14 count 2 CHS 3,0,1 count 2 CHS 3,0,3 count 2 CHS 3,0,5 count 2 CHS 3,0,7 count 2 CHS 3,0,9 count 2 CHS 3,0,11 count 2 CHS 3,0,13 count 2 CHS 3,0,15 count 1 CHS 3,1,1 count 1 CHS 3,1,2 count 2 CHS 3,1,4 count 2 CHS 3,1,6 count 2 CHS 3,1,8 count 2 CHS 3,1,10 count 2 CHS 3,1,12 count 2 CHS 3,1,14 count 2 CHS 4,0,1 count 2 CHS 4,0,3 count 2 CHS 4,0,5 count 2 CHS 4,0,7 count 2 CHS 4,0,9 count 2 CHS 4,0,11 count 2 CHS 4,0,13 count 2 CHS 4,0,15 count 1 CHS 4,1,1 count 1 CHS 4,1,2 count 2

Anyway - supposedly there is more to change here if we're to increase the read size, so I'll leave at that for now. @ghaerr, let me know what part of the CHS list is of interest.

—Mellvik

ghaerr commented 4 years ago

Hello @Mellvik,

Thank you for your excellent testing. There's no question that the ancient fix for 720k floppies is hurting performance by limiting reads to 1 sector for all devices, that definitely needs to be fixed. The fix will need to check for 720k floppies using a device number as well as max sector count though. I will look more into that.

I ventured deeply into the buffer and block device code last night... and learned that the character device handler as well as the low-level MINIX filesystem handler both use exactly the same code, which uses a while loop for reading, which gets a single 1K buffer, reads data into it (2 sectors), then transfers data, etc. So the 2 sector limit is above the low level buffer handling code that we're looking at in the driver.

Then looking at Linux 1.0 code, I find that the MINIX file read routine handles things differently - on a per-device basis, there's a lookup table for how many blocks to try to read at once. This gets used along with the original read request count to grab, say 9 buffers, then issue 9 read requests into these buffers (low level I/O is still limited to 1k block-at-a-time) at once. Then, after I/O is complete, all buffers are used to fulfill the application read request at the same time.

So - I think I have a way to rewrite ELKS code to do the same thing, HOWEVER, the lowest level BIOS calls will still read just 2 sectors at a time. The I/O requests will happen immediately after one another, though. The question is - will this be fast enough? I don't know.

@tkchia wrote the ELKS boot loader, and it uses a FAST_READ define that tries to read as many sectors as possible per BIOS call... am assuming that had to be done versus one sector at a time otherwise. Perhaps he can tell us his experience on how adding sectors until end of track helped increase speed, versus single sector reads. If single or double sector reads still miss the next sector even when I/O requests are executed immediately after each other, then this kernel mod as proposed won't increase speed much.

ghaerr commented 4 years ago

@Mellvik,

I've written a floppy disk speed test program in PR #816 that you can use to further testing for this issue.

Mellvik commented 4 years ago

Thanks @ghaerr, I'll move on to the fdtest program asap.

I ventured deeply into the buffer and block device code last night... and learned that the character device handler as well as the low-level MINIX filesystem handler both use exactly the same code, which uses a while loop for reading, which gets a single 1K buffer, reads data into it (2 sectors), then transfers data, etc. So the 2 sector limit is above the low level buffer handling code that we're looking at in the driver.

Then looking at Linux 1.0 code, I find that the MINIX file read routine handles things differently - on a per-device basis, there's a lookup table for how many blocks to try to read at once. This gets used along with the original read request count to grab, say 9 buffers, then issue 9 read requests into these buffers (low level I/O is still limited to 1k block-at-a-time) at once. Then, after I/O is complete, all buffers are used to fulfill the application read request at the same time.

Yes, this is indeed interesting. I'm tempted to load minix1 on a physical machine just to get a feel for how to works - and the actual speed numbers for comparison.

It's immediately hard to understand why the 1k physical transfer size limit was chosen. The first thing that comes to mind - potentially relevant for floppies in particular - is to avoid extended interrupt blocking. Then that doesn't make sense either, since interrupts should be enabled during disk transfers since they are DMA based, and the CPU can do other things in the meanwhile.

Back to Elks, given the fact that we're currently reading 1.3 sectors average per revolution, and ignoring the DMA 'opportunity' for the moment, it becomes even more interesting that the entire system feels totally dead when floppy I/O is ongoing. From my endless testing of networking these past months, one characteristic that stands out is that when elks is loading something from floppy (say, logging in, loading ash), net interrupts are being registered, (and serial input being buffered - at least to a certain extent) but the kernel will not respond to anything. So disk I/O seems to be much more of an activity 'blocker' than is reasonable.

I know we have been down this road before - vividly remembering our discussions re. slow FAT dir listings about a year ago, and you have thoroughly verified that the interrupt structure is not at fault. So - the search narrows.

Your discoveries and fixes in the buffering system are encouraging indeed, maybe we'll finally nail this one too - this time.

So - I think I have a way to rewrite ELKS code to do the same thing, HOWEVER, the lowest level BIOS calls will still read just 2 sectors at a time. The I/O requests will happen immediately after one another, though. The question is - will this be fast enough? I don't know.

Again, encouraging - let's see what the fdtest numbers say. @tkchia https://github.com/tkchia wrote the ELKS boot loader, and it uses a FAST_READ define that tries to read as many sectors as possible per BIOS call... am assuming that had to be done versus one sector at a time otherwise. Perhaps he can tell us his experience on how adding sectors until end of track helped increase speed, versus single sector reads. If single or double sector reads still miss the next sector even when I/O requests are executed immediately after each other, then this kernel mod as proposed won't increase speed much.

Interesting.

—Mellvik

Mellvik commented 4 years ago

The first numbers are in:

# fdtest
dma start page 7, dma end page 7, buffer at 7190:0 - OK
Reading 18 sectors individually
4 secs
Reading 18 sectors as 9 1024-byte blocks
2 secs
Reading 18 sectors at once
0 secs

I'm making some adjustments to cover the 1.2M drive too, adding options to fastest to save time (transferring the compiled program via ether still takes some time between iterations).

--M

ghaerr commented 4 years ago

Thanks for your testing, this is just what we needed to know:

Reading 18 sectors as 9 1024-byte blocks 2 secs Reading 18 sectors at once 0 secs

Wow, looks like there's only one answer to increasing speed when using BIOS: all sectors have to be read at the same time. It appears that the BIOS setup time to start a transfer is too high, and that more than 2 sectors must be read at once in order to get acceptable speed. The Linux 1.0 I was hoping to use of method of assembling multiple buffers and issuing separate read requests of 1k each won't work. This also means that disk fragmentation is a big issue, although that won't be a initial problem since image binaries are written using adjacent blocks.

It's immediately hard to understand why the 1k physical transfer size limit was chosen

Straightforward - all Linux-system block device drivers are written on top of the (1k) buffer system. All disk requests are read 2 sectors (1k) blocks at a time. ELKS uses the same mechanism. With fast disk drivers, this isn't a problem. Using PC BIOS, speed is unacceptable. That's what fdtest is plainly showing.

one characteristic that stands out is that when elks is loading something from floppy ... the kernel will not respond to anything. So disk I/O seems to be much more of an activity 'blocker' than is reasonable. you have thoroughly verified that the interrupt structure is not at fault. So - the search narrows.

Also straightforward - ELKS issues a synchronous BIOS INT 13h disk I/O call for all reads/writes, which means that the kernel isn't running and does nothing until the BIOS call returns. Think about it, the kernel can't "sleep" a BIOS call and return when it's done, it doesn't even have control, the BIOS does. Of course, all interrupt-driven mechanisms still continue to collect data, as they interrupt the BIOS call when the CPU executes a hardware interrupt.

tkchia commented 4 years ago

Hello @ghaerr, hello @Mellvik,

@tkchia wrote the ELKS boot loader, and it uses a FAST_READ define that tries to read as many sectors as possible per BIOS call... am assuming that had to be done versus one sector at a time otherwise. Perhaps he can tell us his experience on how adding sectors until end of track helped increase speed, [...]

Well, I did not test ELKS's FAST_READ with actual floppy hardware --- since I do not actually have the hardware at hand. But I do recall, when I was working with MS-DOS and FreeDOS back in the day, that (at least in some cases) reading several sectors at once could indeed speed things up.

It's immediately hard to understand why the 1k physical transfer size limit was chosen.

Simplicity would be my guess. There are two problems with reading arbitrary number of sectors at once, and dealing with these would probably make the disk I/O code quite hairy.

The small problem: the "sectors per track" field in the Diskette Drive Parameter Table (the int 0x1e vector) must be updated to at least the maximum sector number we want to read. (Otherwise --- says RBIL --- the BIOS may decide to prematurely "wrap around" to the following track.) To do that, the DDPT will likely need to be copied from ROM to RAM. Much of the bootloader's FAST_READ code deals with the DDPT.
A bigger problem is that for each I/O request, the I/O buffer cannot straddle a 64 KiB DMA boundary (0x1000:0, or 0x2000:0, or 0x3000:0, etc.). If a buffer straddles a boundary, one way to deal with it is to split the buffer into two, but this does not work if the memory area for a single sector will cross a boundary.

Also, if I am reading the source correct, ELKS actually reads 1 sector (i.e. ½KiB) at a time, even though the transfer area is 1 KiB.

Thank you!

ghaerr commented 4 years ago

Hello @tkchia,

Thank you for your input. I hadn't thought about the problem with the DDPT.

the BIOS may decide to prematurely "wrap around" to the following track.

What this means is during a read request, if the CHS sector is greater than the max sectors per track in the DDPT, some BIOS's may increment the track count and reset the sector to 1, rather than increment the sector, if the DDPT doesn't match the floppy geometry that's in the drive?

I have downloaded the (massive) RBIL database and searched it... I can't find Ralf's statement concerning this behavior. A web version of the information regarding INT 1E is here: http://www.ctyme.com/intr/rb-2445.htm, which matches RBIL documentation.

Since the kernel BIOS disk driver (bioshd.c) doesn't copy a DDPT, but always tries to read two sectors, how is it that it always works for 2 sector reads, if the DDPT was for a 720k disk but a 1.44M disk is in the drive?

There is some discussion that the DDPT INT 1E vector needs to be restored before rebooting, which could be problematic if ELKS crashes and needs a ctrl-alt-del reboot. [EDIT: this isn't a big deal, as the DDPT could be restored after each I/O operation.]

for each I/O request, the I/O buffer cannot straddle a 64 KiB DMA boundary

I'm thinking that could be solved by using a large (18 sector = 9k) DMASEG for use in single-track disk reads in the first 64k of memory, placed where it is now, but lengthened.

Thank you!

ghaerr commented 4 years ago

Hello @tkchia,

Also, if I am reading the source correct, ELKS actually reads 1 sector (i.e. ½KiB) at a time, even though the transfer area is 1 KiB

No, ELKS tries to read 2 sectors at a time (=1KiB), if the second sector is the subsequent sector on the same track:

 BD_AX = (req->rq_cmd == WRITE ? BIOSHD_WRITE : BIOSHD_READ) | this_pass;

There is some special code handling issue #39 (#44) which will read only 1 sector, when that sector is the last sector on a 720k floppy track (which was recently changed to function on 720k floppies only, discussed in #39).

Thank you!

Mellvik commented 4 years ago

I'm thinking that could be solved by using a large (18 sector = 9k) DMASEG for use in single-track disk reads in the first 64k of memory, placed where it is now, but lengthened.

FWIW I think this is a very reasonable step forward. Let's see if it works, what the effect is and adjust along the way.

If the 0.17 seconds it takes to read a full track turns out to be unaccetable in terms of interactive response, half a track is a good option. Right now the system is practically dead for the entire read, some times 10seconds, so it's likely that the speedup in read (and write eventually I suppose) will simply be elatingly positive from a user's point of view.

-M

ghaerr commented 4 years ago

I rewrote the BIOS disk driver in PR #823 to cache full tracks (up to 18 sectors) into a low-memory DMASEG.

This handles the 2nd problem @tkchia brought up, but a RAM DDPT is not yet implemented.

I plan on implementing a DDPT once the potential problem of track "wraparound" from RBIL is fully understood. The current BIOS driver always reads two sectors, even if those sectors are greater than max_sector in the DDPT.

Perhaps the driver should compare the ROM DDPT max_sector value and display something if found different after probing?

tkchia commented 4 years ago

Hello @ghaerr,

I have downloaded the (massive) RBIL database and searched it... I can't find Ralf's statement concerning this behavior. A web version of the information regarding INT 1E is here: http://www.ctyme.com/intr/rb-2445.htm, which matches RBIL documentation.

It is mentioned under the write-ups for int 0x13 functions 0x02 and 0x03:

--------B-1302-------------------------------
INT 13 - DISK - READ SECTOR(S) INTO MEMORY
    AH = 02h
    AL = number of sectors to read (must be nonzero)
    ...
Notes:  ...
    most BIOSes support "multitrack" reads, where the value in AL
      exceeds the number of sectors remaining on the track, in which
      case any additional sectors are read beginning at sector 1 on
      the following head in the same cylinder; the MSDOS CONFIG.SYS command
      MULTITRACK (or the Novell DOS DEBLOCK=) can be used to force DOS to
      split disk accesses which would wrap across a track boundary into two
      separate calls

There is some discussion that the DDPT INT 1E vector needs to be restored before rebooting, which could be problematic if ELKS crashes and needs a ctrl-alt-del reboot.

Well, maybe not. :-) A Ctrl-Alt-Del reboot should reinitialize pretty much all the interrupt vectors, including the DDPT. (A "really warm" reboot via int 0x19 may be problematic, but a Ctrl-Alt-Del will not trigger that.)

Thank you!

tkchia commented 4 years ago

Hello @ghaerr, hello @Mellvik,

Perhaps the driver should compare the ROM DDPT max_sector value and display something if found different after probing?

Perhaps so. :-) I will see if I can whip up some code to do that (and maybe more).

Thank you!

Mellvik commented 4 years ago

Thank you @tkchia, this is indeed interesting.

I'd like to add that the 'Note' quoted is incorrect for most bioses. The INT13 read call is documented to accept up to 128 sectors (64K) per read, so the BIOS' 'auto increment' is not restricted to the stated 'additional sectors are read beginning at sector 1 on the following head in the same cylinder;'.

I can imagine that some BIOSes may have a problem with this, I haven't seen any.

—Mellvik

okt. 2020 kl. 11:46 skrev tkchia notifications@github.com:

Hello @ghaerr https://github.com/ghaerr,

I have downloaded the (massive) RBIL database and searched it... I can't find Ralf's statement concerning this behavior. A web version of the information regarding INT 1E is here: http://www.ctyme.com/intr/rb-2445.htm http://www.ctyme.com/intr/rb-2445.htm, which matches RBIL documentation.

It is mentioned under the write-ups for int 0x13 functions 0x02 and 0x03:

--------B-1302------------------------------- INT 13 - DISK - READ SECTOR(S) INTO MEMORY AH = 02h AL = number of sectors to read (must be nonzero) ... Notes: ... most BIOSes support "multitrack" reads, where the value in AL exceeds the number of sectors remaining on the track, in which case any additional sectors are read beginning at sector 1 on the following head in the same cylinder; the MSDOS CONFIG.SYS command MULTITRACK (or the Novell DOS DEBLOCK=) can be used to force DOS to split disk accesses which would wrap across a track boundary into two separate calls There is some discussion that the DDPT INT 1E vector needs to be restored before rebooting, which could be problematic if ELKS crashes and needs a ctrl-alt-del reboot.

Well, maybe not. :-) A Ctrl-Alt-Del reboot should reinitialize pretty much all the interrupt vectors, including the DDPT. (A "really warm" reboot via int 0x19 may be problematic, but a Ctrl-Alt-Del will not trigger that.)

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/521#issuecomment-717154036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOEKJA65PPHPIYRVMBLSM2QJFANCNFSM4MAMWJXA.

ghaerr commented 4 years ago

Hello @tkchia,

Here is my patch for the rewritten track-caching BIOS driver in #823, implementing DDPT:

+static unsigned long __far *vec1E = _MK_FP(0, 0x1E << 2);
+static unsigned long oldvec;
+
+int set_ddpt(unsigned int max_sector)
+{
+       oldvec = *vec1E;
+       unsigned char __far *org_ddpt = (void __far *)oldvec;
+       static unsigned char ddpt[12];
+
+       if (org_ddpt[4] == (unsigned char)max_sector)
+               return 0;
+
+       dprintk("DDPT %d\n", org_ddpt[4]);
+       fmemcpyw(ddpt, _FP_SEG(ddpt), (void *)(unsigned)oldvec, _FP_SEG(oldvec), 6);
+       ddpt[4] = (unsigned char)max_sector;
+       *vec1E = (unsigned long)(void __far *)ddpt;
+       return 1;
+}
+
+void reset_ddpt(void)
+{
+       *vec1E = oldvec;
+}

Since there is some discussion on exactly how the DDPT might be useful to (or possibly buggy on some BIOSes), I will wait until before adding it or @tkchia's code to the new driver. The full patch is listed under #823.

ghaerr commented 4 years ago

Hello @Mellvik,

This issue - increased lag during reads, needs more practical use to evaluate.

Why not read 18 sectors regardless of track boundaries (unless hitting the end of the device of course)? The BIOS handles it just fine - and it's fast.

These are the competing issues when it comes to determine "how much" to read ahead. The system needs to be observed reading various sectors using the ^P-controllable debug display to determine when caching gives the most benefit. Some ELKS sector reads are only for a single block, while others are for long sequences. In Linux, (and soon in ELKS) a "read-ahead" flag is set indicating that a sequential file read is in progress, versus inode or directory I/O. I think the caching may want to follow that flag for best overall improvement.

With regards to reading sectors beyond end-of-track, I think I am coming to understand what @tkchia has brought up (and coded into our original BIOS driver, which is used as a fallback when track reads fail or CONFIG_TRACK_CACHE is unset): That there may be some BIOS bugs or irregularities with track-incrementing, but that we can guarantee that a track is read to the end if a RAM-based DDPT is installed with max_sectors equal to the drive geometry, if we only read to end-of-track. Is that correct, @tkchia?

Thank you!

tkchia commented 4 years ago

Hello @Mellvik,

I can imagine that some BIOSes may have a problem with this, I haven't seen any.

Me neither (for now), but maybe it is just because I have not tried too hard to mess with the BIOS before...

Hello @ghaerr,

we can guarantee that a track is read to the end if a RAM-based DDPT is installed with max_sectors equal to the drive geometry, if we only read to end-of-track. Is that correct, @tkchia?

The DDPT "sectors per track" field can be larger than the actual track size; it should not be smaller though.

Thank you!

Mellvik commented 4 years ago

Hi @tkchia, @ghaerr,

I'm trying to make sense of the usefulness of the DDPT which, if I understand it correctly, applies to floppies only.

Then the primary case I can imagine is the 1.2M 5.25in drive being used as a 360k drive - different # of tracks and sectors, both lower than the CMOS configured value. Possibly even a 320K drive. Then there are the 1.44 or 2.88M drives, some of which also scale down.

Finally, at least in the case of the Compaq portable, when the CMOS battery is gone, it will autoconfigure the A drive (only) as a 1.2M 5.25in drive, but will still boot and work with a 1.44M A-drive.

If these cases are relevant to the DDPT, the end-of-track can actually go both ways, but down in most cases.

—Mellvik

okt. 2020 kl. 18:14 skrev tkchia notifications@github.com:

Hello @Mellvik https://github.com/Mellvik,

I can imagine that some BIOSes may have a problem with this, I haven't seen any.

Me neither (for now), but maybe it is just because I have not tried too hard to mess with the BIOS before...

Hello @ghaerr https://github.com/ghaerr,

we can guarantee that a track is read to the end if a RAM-based DDPT is installed with max_sectors equal to the drive geometry, if we only read to end-of-track. Is that correct, @tkchia https://github.com/tkchia?

The DDPT "sectors per track" field can be larger than the actual track size; it should not be smaller though.

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/521#issuecomment-717394389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOBYEHWRJDZOG5UR7J3SM35Y7ANCNFSM4MAMWJXA.

tkchia commented 4 years ago

Hello @Mellvik,

Finally, at least in the case of the Compaq portable, when the CMOS battery is gone, it will autoconfigure the A drive (only) as a 1.2M 5.25in drive, but will still boot and work with a 1.44M A-drive.

Yes, I imagine this is the kind of situation where one might need to tweak the DDPT.

A 1.2MiB-format disk has 15 sectors per track, while a 1.44MiB-format disk has 18. So basically if we want to read the whole of track 1, side 1, then we should make sure the BIOS reads 18 sectors from track 1, side 1, rather than (wrongly) 15 sectors from there followed by 3 sectors from the next track.

I know that both MS-DOS and FreeDOS do move the DDPT into RAM (and they also intercept int 0x13 and add their own patches).

Thank you!

Mellvik commented 4 years ago

Yes, I imagine this is the kind of situation where one might need to tweak the DDPT.

A 1.2MiB-format disk has 15 sectors per track, while a 1.44MiB-format disk has 18. So basically if we want to read the whole of track 1, side 1, then we should make sure the BIOS reads 18 sectors from track 1, side 1, rather than (wrongly) 15 sectors from there followed by 3 sectors from the next track.

OK; we can test this. The machine I'm using most of the time has a broken CMOS battery. If turned off for more than an hour, it loses configuration. Irritating, but it's rarely turned off at all, and now it's fortunate. I couldn't get @ghaerr's patch in, so the first test is w/o the patch.

—Mellvik

Mellvik commented 4 years ago

The verdict is in - ELKS boots fine on the unconfigured machine. So even without special DDPT support, the BIOS reports correct data. This may be unique to Compaq, I don’t know, but at least we have a physical test case.

—Mellvik

okt. 2020 kl. 19:03 skrev Helge Skrivervik helge@mymayday.com:

Yes, I imagine this is the kind of situation where one might need to tweak the DDPT.

A 1.2MiB-format disk has 15 sectors per track, while a 1.44MiB-format disk has 18. So basically if we want to read the whole of track 1, side 1, then we should make sure the BIOS reads 18 sectors from track 1, side 1, rather than (wrongly) 15 sectors from there followed by 3 sectors from the next track.

OK; we can test this. The machine I'm using most of the time has a broken CMOS battery. If turned off for more than an hour, it loses configuration. Irritating, but it's rarely turned off at all, and now it's fortunate. I couldn't get @ghaerr's patch in, so the first test is w/o the patch.

—Mellvik

ghaerr commented 4 years ago

Hello @tkchia and @Mellvik,

Thank you for your explanations and information. After studying it, as well as looking at various BIOS source code and emulators, I now think I pretty much fully understand what is going on. Here is a summary of the information, put forward to see if we're in full agreement.

Covering each issue separately:

INT 13h Multitrack

The INT 13h disk read function is not 'buggy on some' BIOSes. Instead, as @tkchia's RBIL information points out, most will support a "multitrack" function. Note: INT 13h only accepts CHS input, tracks are a virtual concept involving a cylinder and head.

INT 13h multitrack specifically only does the following, which is not the same as "reading the next track":

[if the requested number of sectors] exceeds the number of sectors remaining on the track, 
any additional sectors are read beginning at sector 1 on the following head in the same cylinder

INT 13h multitrack does NOT always read the next "track" from disk after the last sector. It only increments the head # and resets the sector number to 1. The function effectively only reads the next "track" when starting from head < max_heads.
INT 13h multitrack uses the DDPT sectors per track field to determine when to increment the head.
Thus, INT 13h cannot be used in our ELKS track-caching scheme, unless the Head number is also compared to be < max_heads.

Disk Drive Parameter Table

The INT 1E low-memory interrupt vector location is used to store the (far) address of the BIOS DDPT, which points into ROM.
The (ROM) DDPT at offset 4 is a byte containing the sectors per track (SPT), used for any BIOS function that needs to know about CHS arithmetic. INT 13h always uses it.
Some BIOSes will check the passed INT 13h sector against SPT, others don't.
All BIOSes implementing INT 13 multitrack use the SPT in know when to increment the head.
Some BIOSes will update the DDPT when booting, to match what it thinks the diskette SPT is.

ELKS Boot Code:

The ELKS boot code will setup a custom DDPT if FAST_READ is set (it is default ON).
The custom DDPT will use the SPT value that is stamped into the boot block by the ELKS image creation process. This means that copying an image from a 3.5" disk to a 5.25" disk won't boot. This is by design, as the alternative is to trust the BIOS SPT, which may or may not be correct. This is why ELKS boots on an unconfigured Compaq. ELKS users must select a specific image built via the config mechanism, or one of the many images produced when requested to make all of them.
The FAST_READ option uses the INT 13h multitrack function to quickly boot, but only reads to the end of a track. The boot code uses the stamped-in SPT do limit the INT 13 read operation to non-multitrack.

ELKS BIOS floppy driver:

Doesn't use a custom DDPT.
Almost always uses 2-sector reads. Because 1.44M disks have an even number of sectors per track, this means that the BIOS driver will never actually perform "multitrack" on them. However, 720k and 360k drives have an odd number (9) SPT. This would cause a multitrack read, and ensuing error if head was already 1. I believe this is the actual cause of the error in #39/#44. That fix resulted in always reading 1 sector when the number of remaining sectors < 3. That fix didn't differentiate between 720k and 1.44M floppies, and the resulting performance is poor.
The new track-cache driver enhancement only reads until the end of a track.

Summary:

In general, BIOSes try to keep up with the disk geometry.
ELKS, however uses custom boot disks and additionally probes after boot to determine disk geometry.
For operating systems handling their own disk geometries, use of a custom DDPT is wise.

Proposed:

Have ELKS add a custom DDPT whenever the floppy is probed, matching the geometry.
The custom DDPT doesn't need to be uninstalled.
Ensure neither the track-cache nor original/fallback BIOS driver tries reading past end-of-track, unless possibly head < max_heads is checked.

Thank you!

Mellvik commented 4 years ago

Thank you @ghaerr for a thorough rundown.

Given the findings I agree with your conclusion.
That said, I have two comments. As pointed out yesterday, I'm questioning the generality of the 'autoincrement problem'. BIOS docs limit reads (and probably writes, I didn't check that) to 128 sectors for obvious reasons. Thus correct autoincrementing of head and cylinder is assumed to work (and does so on the compaq). An interesting question then is how widespread the autoincrement bugs is. Nevertheless, the safe choise for ELKS, as you propose, is the mechanism known to work in most systems. Unless the difference is deemed significant enough to qualify for a config option. Out of curiosity, I'm going to make a minor mod to fdtest, reading a full disk @ 25 sectors per op and saving the output, then comparing to the original.

The 2nd issue, which may be covered by your proposal, is the case of formatting floppies with geometry different from the probed (or, if the drive was empty when the system booted, the default). If this is to be supported, is there a way to force the geometry to something else than the probed/default values?

Thanks.

--M

okt. 2020 kl. 01:55 skrev Gregory Haerr notifications@github.com:

Hello @tkchia and @Mellvik,

Thank you for your explanations and information. After studying it, as well as looking at various BIOS source code and emulators, I now think I pretty much fully understand what is going on. Here is a summary of the information, put forward to see if we're in full agreement.

Covering each issue separately:

INT 13h Multitrack

The INT 13h disk read function is not 'buggy on some' BIOSes. Instead, as @tkchia's RBIL information points out, most will support a "multitrack" function. Note: INT 13h only accepts CHS input, tracks are a virtual concept involving a cylinder and head. INT 13h multitrack specifically only does the following, which is not the same as "reading the next track": [if the requested number of sectors] exceeds the number of sectors remaining on the track, any additional sectors are read beginning at sector 1 on the following head in the same cylinder INT 13h multitrack does NOT always read the next "track" from disk after the last sector. It only increments the head # and resets the sector number to 1. The function effectively only reads the next "track" when starting from head < max_heads. INT 13h multitrack uses the DDPT sectors per track field to determine when to increment the head. Thus, INT 13h cannot be used in our ELKS track-caching scheme, unless the Head number is also compared to be < max_heads. Disk Drive Parameter Table

The INT 1E low-memory interrupt vector location is used to store the (far) address of the BIOS DDPT, which points into ROM. The (ROM) DDPT at offset 4 is a byte containing the sectors per track (SPT), used for any BIOS function that needs to know about CHS arithmetic. INT 13h always uses it. Some BIOSes will check the passed INT 13h sector against SPT, others don't. All BIOSes implementing INT 13 multitrack use the SPT in know when to increment the head. Some BIOSes will update the DDPT when booting, to match what it thinks the diskette SPT is. ELKS Boot Code:

The ELKS boot code will setup a custom DDPT if FAST_READ is set (it is default ON). The custom DDPT will use the SPT value that is stamped into the boot block by the ELKS image creation process. This means that copying an image from a 3.5" disk to a 5.25" disk won't boot. This is by design, as the alternative is to trust the BIOS SPT, which may or may not be correct. This is why ELKS boots on an unconfigured Compaq. ELKS users must select a specific image built via the config mechanism, or one of the many images produced when requested to make all of them. The FAST_READ option uses the INT 13h multitrack function to quickly boot, but only reads to the end of a track. The boot code uses the stamped-in SPT do limit the INT 13 read operation to non-multitrack. ELKS BIOS floppy driver:

Doesn't use a custom DDPT. Almost always uses 2-sector reads. Because 1.44M disks have an even number of sectors per track, this means that the BIOS driver will never actually perform "multitrack" on them. However, 720k and 360k drives have an odd number (9) SPT. This would cause a multitrack read, and ensuing error if head was already 1. I believe this is the actual cause of the error in #39/#44. That fix resulted in always reading 1 sector when the number of remaining sectors < 3. That fix didn't differentiate between 720k and 1.44M floppies, and the resulting performance is poor. The new track-cache driver enhancement only reads until the end of a track. Summary:

In general, BIOSes try to keep up with the disk geometry. ELKS, however uses custom boot disks and additionally probes after boot to determine disk geometry. For operating systems handling their own disk geometries, use of a custom DDPT is wise. Proposed:

Have ELKS add a custom DDPT whenever the floppy is probed, matching the geometry. The custom DDPT doesn't need to be uninstalled. Ensure neither the track-cache nor original/fallback BIOS driver tries reading past end-of-track, unless possibly head < max_heads is checked. Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghaerr commented 4 years ago

I'm questioning the generality of the 'autoincrement problem'

I don't know, but remember the point of this exercise is to increase ELKS floppy throughput. The ELKS block I/O system is entirely built around 1k block (two sector) read/writes, so arbitrarily reading ahead large amounts will decrease performance and requires dedicated main memory buffers. The track cache is below the level of the block I/O system and the upper levels know nothing about it. I chose buffering up to full tracks since once the floppy read arm is positioned, it seemed like the fastest way to achieve a performance increase. Moving the floppy arm seems like more delays, although of course it could help in some cases, if the BIOS supports it.

Our testing needs to concentrate on where and when ELKS would benefit from reading ahead. This is complicated by the fact that standard ELKS block buffering occurs in the block I/O system, so that multiple invocations of a single command are likely to be buffered at that level.

Two cases I'd like to consider are: 1) blocks read during system boot, including rc.sys execution and network startup, and 2) known cases of large streaming file data, like reading files for network transfer. We should develop test scenarios for these use cases that will most help increase floppy I/O speed. This is only for reading; increasing write speed will be more problematic and is not covered yet.

The track cache also needs to be tested with different disk formats. It would be nice to test on a 720k floppy somehow, since I still don't fully understand the 720k floppy fix/hack and want to address that in this enhancement as well.

If this is to be supported, is there a way to force the geometry to something else than the probed/default values?

If you mean force the BIOS, then a custom DDPT does that. If you mean force ELKS, the drive_infot parameters need to be set to the desired values in the ELKS BIOS driver. It is likely that a user program would do the formatting, and then ELKS would do a probe afterwards just as it does now.

Mellvik commented 4 years ago

Two cases I'd like to consider are: 1) blocks read during system boot, including rc.sys execution and network startup, and 2) known cases of large streaming file data, like reading files for network transfer. We should develop test scenarios for these use cases that will most help increase floppy I/O speed. This is only for reading; increasing write speed will be more problematic and is not covered yet.

Again, I'm in. And I think it's really important to keep it simple. At this point I think the full track profile is reasonable. And in order to avoid any autoincrement issues, just start at the current sector and buffer the rest of the track. Some times no improvement, some times a lot, but it will even out. The track cache also needs to be tested with different disk formats. It would be nice to test on a 720k floppy somehow, since I still don't fully understand the 720k floppy fix/hack and want to address that in this enhancement as well.

I'm seriously beginning to question whether 720k support is worth our time. The systems (and drives) are few and far between. How about skipping it entirely and wait for someone willing to participate and do the testing? I'm trying to get 720k working on one of my systems, but the initial tests are not promising. If this is to be supported, is there a way to force the geometry to something else than the probed/default values?

If you mean force the BIOS, then a custom DDPT does that. If you mean force ELKS, the drive_infot parameters need to be set to the desired values in the ELKS BIOS driver. It is likely that a user program would do the formatting, and then ELKS would do a probe afterwards just as it does now.

I was thinking elks and whatever is needed to tell the driver to change to a different format (i e ioctl or similar).

-M

—

ghaerr commented 4 years ago

@Mellvik - are you satisfied with the floppy performance for now, or should this issue be kept open? I would prefer opening another issue when performance issues become a problem again, if you are satisfied with the track caching.

Mellvik commented 4 years ago

@ghaerr, we do have significant improvement - a much more useable system when running off of floppies, and I'm ready to close the issue. I'd like to end on the same note as we started, comparing new and old timings on the 286. Will post those and close up. Thank you very much - it's been an interesting and fruitful ride.

--Mellvik

Mellvik commented 4 years ago

A final note before closing this issue - some very non-scientific numbers, ignoring a lot of more or less significant unknowns such as where a file is located on (floppy) disk and how reliable time() really is. Admittedly, my phone stopwatch reports higher numbers than time, but time has the advantage of exact start/stop recording. Anyway - here are the numbers - compared with the readings at the opening of this issue, old numbers first, new numbers second, then % delta:

HW: 286 12MHz, FD1,2M, HD type 17 42MB Conner, booting from Minix floppy

ls -l /bin [7s] [3.7s] [90%] (redirected to /dev/null to eliminate the effect f the serial line vs physical console)
cp /bin/vi /bin/xx [28.5s] [27.3s] (similar)
ps [2s] [3s] (not I/O relevant, load time is probably less than 'data gathering' time)
cat /etc/rc.d/rc.sys [4s] [2s] [100%] (again redirected to /dev/null)
cp 245kfile xx [HD to HD, FATfs] [24s] [24s] (unchanged - as expected)

Not all that enlightening other than confirming appearances - 'significant improvement' on floppy read. In order to bring this further, I guess we would have to attack writing.

Hard disk I/O is still slow - mostly unaffected by all the recent improvements and probably the place to focus the next round of speedup efforts.

Thanks @ghaerr !!

--Mellvik

ghaerr commented 4 years ago

Hello @Mellvik,

Interesting final comparison. We've really helped the floppy speed, which I initially didn't think possible. The change is especially noticable when booting a networking system on my Compaq 386... the older version seems almost untenable.

Regarding ps taking another second longer to load - this is likely due to having to read another whole track when barely needed. At some point in the future, we could try to consider some heuristics for the amount of caching to perform. For instance, the exec loader could give a hint as to the size of the code file. This may or may not help though.

Overall, the system is obviously much quicker, but seems to occasionally take a small bit more time, like with ps. Of course, system buffers could be increased (using CONFIG_EXT_BUFFERS) that would keep more programs around in memory, if wanted.

Hard disk I/O is still slow - mostly unaffected by all the recent improvements and probably the place to focus the next round of speedup efforts.

I'm not sure - all hard disk I/O is also currently cached, (although only 18 sectors max), and you're saying it hasn't helped much. Caching more would start taking loads of otherwise valuable memory that would best be used in configurable L2 buffers, IMO. So I'm not sure how to speed up HD I/O at this point. The current code replaces the FD cache with HD data, which effectively invalidates the FD cache. Trying to calculate how the cache should be shared, if at all, will require much more research.

BTW, did you notice the "spinning ball" on disk I/O?

Overall, a great improvement from where we were!

Mellvik commented 4 years ago

@ghaerr,

Yes I did notice the spinning ball, but since I hardly ever work at the physical console, I don't get all the benefits of it :-).

I decided to turn on BIOS debug messages, but didn't take the time to read the code to understand them all, thus the question: Is this what you'd expect? And what does NO_CACHE really mean here?

bioshd: lba 1970 is CHS 54/1/9 remaining sectors 10 bioshd(4): NO-CACHE read lba 1970 len 2 bioshd: drive 0 cmd 0 CHS 54/1/9 count 2 bioshd: lba 1396 is CHS 38/1/11 remaining sectors 8 bioshd(4): NO-CACHE read lba 1396 len 2 bioshd: drive 0 cmd 0 CHS 38/1/11 count 2 bioshd: lba 1398 is CHS 38/1/13 remaining sectors 6 bioshd(4): NO-CACHE read lba 1398 len 2 bioshd: drive 0 cmd 0 CHS 38/1/13 count 2 bioshd: lba 1400 is CHS 38/1/15 remaining sectors 4 bioshd(4): NO-CACHE read lba 1400 len 2 bioshd: drive 0 cmd 0 CHS 38/1/15 count 2 bioshd: lba 1402 is CHS 38/1/17 remaining sectors 2 bioshd(4): NO-CACHE read lba 1402 len 2 bioshd: drive 0 cmd 0 CHS 38/1/17 count 2 bioshd: lba 1404 is CHS 39/0/1 remaining sectors 18 bioshd(4): NO-CACHE read lba 1404 len 2 bioshd: drive 0 cmd 0 CHS 39/0/1 count 2 bioshd: lba 1406 is CHS 39/0/3 remaining sectors 16 bioshd(4): NO-CACHE read lba 1406 len 2 bioshd: drive 0 cmd 0 CHS 39/0/3 count 2 bioshd: lba 1408 is CHS 39/0/5 remaining sectors 14 bioshd(4): NO-CACHE read lba 1408 len 2 bioshd: drive 0 cmd 0 CHS 39/0/5 count 2 bioshd: lba 1412 is CHS 39/0/9 remaining sectors 10 bioshd(4): NO-CACHE read lba 1412 len 2 bioshd: drive 0 cmd 0 CHS 39/0/9 count 2 bioshd: lba 1410 is CHS 39/0/7 remaining sectors 12 bioshd(4): NO-CACHE read lba 1410 len 2 bioshd: drive 0 cmd 0 CHS 39/0/7 count 2 bioshd: lba 14 is CHS 0/0/15 remaining sectors 4 bioshd(4): NO-CACHE read lba 14 len 2 bioshd: drive 0 cmd 0 CHS 0/0/15 count 2

bioshd: lba 1962 is CHS 54/1/1 remaining sectors 18 bioshd(4): NO-CACHE read lba 1962 len 2 bioshd: drive 0 cmd 0 CHS 54/1/1 count 2

ELKS 0.3.0

—Mellvik

nov. 2020 kl. 15:39 skrev Gregory Haerr notifications@github.com:

Hello @Mellvik https://github.com/Mellvik,

Interesting final comparison. We've really helped the floppy speed, which I initially didn't think possible. The change is especially noticable when booting a networking system on my Compaq 386... the older version seems almost untenable.

Regarding ps taking another second longer to load - this is likely due to having to read another whole track when barely needed. At some point in the future, we could try to consider some heuristics for the amount of caching to perform. For instance, the exec loader could give a hint as to the size of the code file. This may or may not help though.

Overall, the system is obviously much quicker, but seems to occasionally take a small bit more time, like with ps. Of course, system buffers could be increased (using CONFIG_EXT_BUFFERS) that would keep more programs around in memory, if wanted.

Hard disk I/O is still slow - mostly unaffected by all the recent improvements and probably the place to focus the next round of speedup efforts.

I'm not sure - all hard disk I/O is also currently cached, (although only 18 sectors max), and you're saying it hasn't helped much. Caching more would start taking loads of otherwise valuable memory that would best be used in configurable L2 buffers, IMO. So I'm not sure how to speed up HD I/O at this point. The current code replaces the FD cache with HD data, which effectively invalidates the FD cache. Trying to calculate how the cache should be shared, if at all, will require much more research.

BTW, did you notice the "spinning ball" on disk I/O?

Overall, a great improvement from where we were!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/521#issuecomment-723114614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOBYFF7RXT7CUQU4XGLSOQDBHANCNFSM4MAMWJXA.

ghaerr commented 4 years ago

Is this what you'd expect?

Yes.

And what does NO_CACHE really mean here?

Those are the non-track-cached reads from the HD. Remember the max sectors for track caching is 18, so everything else is read non-cached. I will probably lower case the no-cache now that it's all working. It was in caps to differentiate from all the other BIOS messages during development.