FDOS / kernel

FreeDOS kernel - implements the core MS-DOS/PC-DOS (R) compatible operating system. It is derived from Pat Villani's DOS-C kernel and released under the GPL v2 or later. Please see http://www.freedos.org/ for more details about the FreeDOS (TM) Project.
http://kernel.fdos.org/
GNU General Public License v2.0
805 stars 143 forks source link

FCB find problems #112

Open ecm-pushbx opened 1 year ago

ecm-pushbx commented 1 year ago

I set up two tests of the FCB FindFirst/FindNext functions on a FAT32 FS hard disk image. All tests are run with qemu, version QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-11+deb11u2). The first two test runs were on a current revision of Enhanced DR-DOS, while the latter two test runs were on the FreeDOS kernel.

All files can be found in https://pushbx.org/ecm/test/20230821/

These are the commands used to create the disk images: https://pushbx.org/ecm/test/20230821/commands.txt Reproduced here:

nasm -I ~/proj/lmacros/ ~/proj/bootimg/bootimg.asm \
  -D_BPE=32 -D_SPI=65536+2048 -D_MBR -D_ALIGNDATA \
  -D_PAYLOADFILE="::fill,512*65536,0,'zeroes.bin',::chdir,'test',::fill,1,0,'a',::fill,1,0,'b',::fill,1,0,'c',::chdir,..,'testnorm.com','testexte.com'" \
  -D_SPF=640 -o testfcb.img -D_CHS_HEADS=16 -D_CHS_SECTORS=32 -D_MEDIAID=0F8h

DOSEMU=~/local.new/bin/dosemu

nasm ../bootimg/bootimg.asm -I ../lmacros/ -I ~/.dosemu/drive_c/ \
  -I ../ldebug/bin/ -o diskfd.img \
  -D_PAYLOADFILE=kernel.sys,command.com,fdconfig.sys,fdautoex.bat,ldebug.sld,himemx.exe,ldebug.com,::rename,ldebug.sld,lcdebug.sld,lcdebug.com,sys.com,quit.com,int3.com,int19.com,callver.com,ldebugx.com \
  && "$DOSEMU" -I "floppy { device diskfd.img }" \
  -E "a:sys A: /bootonly /oem:fd" -dumb -td -kt \
  && qemu-system-i386 -fda diskfd.img -boot order=a -display curses \
  -chardev serial,id=serial2,path=/tmp/vptty-dos -serial null \
  -serial chardev:serial2 -hda testfcb.img; stty sane

nasm ../bootimg/bootimg.asm -I ../lmacros/ -I ../edrdos/edrsys/ \
  -I ../edrdos/repo/drbio/bin/ -I ../edrdos/repo/drdos/bin/ \
  -I ../edrdos/repo/command/bin/ -I ../ldebug/bin/ -o diskedr.img \
  -D_PAYLOADFILE=drbio.sys,drdos.sys,command.com,dconfig.sys,drautoex.bat,ldebug.sld,himemx.exe,ldebug.com,::rename,ldebug.sld,lcdebug.sld,lcdebug.com,SYS.COM,quit.com,int3.com,int19.com,callver.com,ldebugx.com \
  && "$DOSEMU" -I "floppy { device diskedr.img }" \
  -E "a:sys A: /bootonly /oem:edr" -dumb -td -kt \
  && qemu-system-i386 -fda diskedr.img -boot order=a -display curses \
  -chardev serial,id=serial2,path=/tmp/vptty-dos -serial null \
  -serial chardev:serial2 -hda testfcb.img; stty sane

ldebug \testnorm.com

ldebug \testexte.com

install serial

re.append @if (word [cs:cip - 2] == 21CD) then d psp:80 l 28

These scriptlets use NASM, my macro collection, my FAT FS format script written in NASM, dosemu2 (only for running FreeDOS/EDR-DOS sys on the diskette image), qemu, lDebug, the EDR-DOS system build, and the FreeDOS kernel and FreeCOM build. Also included are HimemX and a number of smaller utilities, not needed for this test but used for prior tests. The test programs testnorm.com and testexte.com were created in the debugger so there's no source files corresponding to them.

The qemu serial ports are connected to a socat instance created using this command:

socat -r /tmp/trace.log pty,link=/tmp/vptty-dos,rawer pty,link=/tmp/vptty-linux,rawer & disown

Which in turn is connected to a picocom running like so:

picocom /tmp/vptty-linux

Each test was performed by running the DOS command ldebug \testXXXX.com then a debugger command install serial. On the serial terminal, a "KEEP" confirmation is entered and finally three commands to the debugger (visible in the serial port logs) that go like re.append @if (word [cs:cip - 2] == 21CD) then d psp:80 l 28 then tp FFFF then q. The RE buffer append command sets it up so that after tracing an interrupt 21h call (CD 21 opcode), the debugger will dump the DTA area at address PSP:80h. This allows it to automatically display what result a file search call yielded.

All tests are done by calling FCB FindFirst (function 11h) and FindNext (function 12h) with various states (\TEST directory or root directory, prior search call) as well as a search mask of eleven question marks. This is intended to match any directory entry. The testnorm.com test uses normal FCBs as the search/result structure, while testexte.com uses extended FCBs with the attribute 17h (directory, system, hidden, read-only).

I will not reproduce the full log here, but it is available on the server at https://pushbx.org/ecm/test/20230821/logs.txt

There are several differences:

This is the first result from EDR-DOS running testnorm.com, searching using a normal FCB in the \TEST subdirectory:

AX=0000 BX=0000 CX=0200 DX=0240 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0109 NV UP EI PL ZR NA PE NC
2FB1:0109 B82611            mov     ax, 1126
AX=1126 BX=0000 CX=0200 DX=0240 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=010C NV UP EI PL ZR NA PE NC
2FB1:010C BA0002            mov     dx, 0200
AX=1126 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=010F NV UP EI PL ZR NA PE NC
2FB1:010F CD21              int     21
AX=1100 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0111 NV UP EI PL ZR NA PE NC
2FB1:0111 84C0              test    al, al
2FB1:0080  03 41 20 20 20 20 20 20-20 20 20 20 00 00 00 00 .A          ....
2FB1:0090  00 00 00 00 00 01 00 00-00 00 00 04 00 01 00 00 ................
2FB1:00A0  00 00 00 00 00 00 00 00-                        ........

As you can see, the first returned result is one of the (1-byte) test files.

This is the corresponding FreeDOS test run:

AX=3B00 BX=0000 CX=0200 DX=0240 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0109 NV UP EI PL ZR NA PE NC
3168:0109 B82611            mov     ax, 1126
AX=1126 BX=0000 CX=0200 DX=0240 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=010C NV UP EI PL ZR NA PE NC
3168:010C BA0002            mov     dx, 0200
AX=1126 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=010F NV UP EI PL ZR NA PE NC
3168:010F CD21              int     21
AX=1100 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0111 NV UP EI PL ZR NA PE NC
3168:0111 84C0              test    al, al
3168:0080  03 2E 20 20 20 20 20 20-20 20 20 20 10 00 00 00 ..          ....
3168:0090  00 00 00 00 00 01 00 00-00 00 00 03 00 00 00 00 ................
3168:00A0  00 00 00 00 00 00 00 00-                        ........

This one returns the dot directory entry of the subdirectory. (Look for the 2Eh text byte at address 0081h in the dump.) Evidently, FreeDOS includes subdirectories in its normal FCB search while EDR-DOS does not. (I checked this assumption by doing some more test runs, not depicted in the log. Repeatedly having EDR-DOS search the root directory without an extended FCB does return ZEROES.BIN, TESTNORM.COM, and TESTEXTE.COM but not the TEST directory.)

The testexte.com test reveals that EDR-DOS is able to reconstruct the search DTA from the FCB passed to it, whereas FreeDOS is unable to do that. This is from the EDR-DOS run:

AX=0000 BX=0000 CX=0200 DX=0250 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0130 NV UP EI PL ZR NA PE NC
2FB1:0130 B82611            mov     ax, 1126
AX=1126 BX=0000 CX=0200 DX=0250 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0133 NV UP EI PL ZR NA PE NC
2FB1:0133 BA8002            mov     dx, 0280
AX=1126 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0136 NV UP EI PL ZR NA PE NC
2FB1:0136 CD21              int     21
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0138 NV UP EI PL ZR NA PE NC
2FB1:0138 84C0              test    al, al
2FB1:0080  FF 00 00 00 00 00 17 03-5A 45 52 4F 45 53 20 20 ........ZEROES
2FB1:0090  42 49 4E 00 00 00 00 00-00 00 00 00 00 00 00 00 BIN.............
2FB1:00A0  00 00 03 00 00 00 00 02-                        ........
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=013A NV UP EI PL ZR NA PE NC
2FB1:013A 7544              jnz     0180                            not jumping
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=013C NV UP EI PL ZR NA PE NC
2FB1:013C B82612            mov     ax, 1226
AX=1226 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=013F NV UP EI PL ZR NA PE NC
2FB1:013F BA0002            mov     dx, 0200
AX=1226 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0142 NV UP EI PL ZR NA PE NC
2FB1:0142 CD21              int     21
AX=1200 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=2FB1 ES=2FB1 SS=2FB1 CS=2FB1 IP=0144 NV UP EI PL ZR NA PE NC
2FB1:0144 84C0              test    al, al
2FB1:0080  FF 00 00 00 00 00 17 03-42 20 20 20 20 20 20 20 ........B
2FB1:0090  20 20 20 00 00 00 00 00-00 00 00 00 01 00 00 00    .............
2FB1:00A0  00 00 05 00 01 00 00 00-                        ........

The second, interleaved search starts in the root directory and finds ZEROES.BIN. Then, the prior search (in the subdirectory) is resumed and finds B, the second non-dots directory entry. Thus, EDR-DOS succeeds in our test.

Next, consider the FreeDOS result of the same test:

AX=3B00 BX=0000 CX=0200 DX=0250 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0130 NV UP EI PL ZR NA PE NC
3168:0130 B82611            mov     ax, 1126
AX=1126 BX=0000 CX=0200 DX=0250 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0133 NV UP EI PL ZR NA PE NC
3168:0133 BA8002            mov     dx, 0280
AX=1126 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0136 NV UP EI PL ZR NA PE NC
3168:0136 CD21              int     21
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0138 NV UP EI PL ZR NA PE NC
3168:0138 84C0              test    al, al
3168:0080  FF 00 00 00 00 00 17 03-5A 45 52 4F 45 53 20 20 ........ZEROES
3168:0090  42 49 4E 00 00 00 00 00-00 00 00 00 00 00 00 00 BIN.............
3168:00A0  00 00 03 00 00 00 00 02-                        ........
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=013A NV UP EI PL ZR NA PE NC
3168:013A 7544              jnz     0180                            not jumping
AX=1100 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=013C NV UP EI PL ZR NA PE NC
3168:013C B82612            mov     ax, 1226
AX=1226 BX=0000 CX=0200 DX=0280 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=013F NV UP EI PL ZR NA PE NC
3168:013F BA0002            mov     dx, 0200
AX=1226 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0142 NV UP EI PL ZR NA PE NC
3168:0142 CD21              int     21
AX=1200 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0144 NV UP EI PL ZR NA PE NC
3168:0144 84C0              test    al, al
3168:0080  FF 00 00 00 00 00 17 03-54 45 53 54 20 20 20 20 ........TEST
3168:0090  20 20 20 10 00 00 00 00-00 00 00 00 01 00 00 00    .............
3168:00A0  00 00 03 00 00 00 00 00-                        ........
AX=1200 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0146 NV UP EI PL ZR NA PE NC
3168:0146 7538              jnz     0180                            not jumping
AX=1200 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=0148 NV UP EI PL ZR NA PE NC
3168:0148 813E88004220      cmp     word [0088], 2042              DS:0088=4554
AX=1200 BX=0000 CX=0200 DX=0200 SP=FFFE BP=0000 SI=0000 DI=0000
DS=3168 ES=3168 SS=3168 CS=3168 IP=014E NV UP EI PL NZ NA PE NC
3168:014E 7530              jnz     0180                                jumping

Trying to resume the subdirectory's search, we instead get the second directory entry of the root directory. The kernel uses its static buffer and is unable to resume a concurrent search, where EDR-DOS (and presumably other 8086 DOS compatibles) is able to do just that.

Also note that this is on a FAT32 file system, so perhaps EDR-DOS is uniquely capable. The interrupt list notes that FCB open only works on FAT32 volume labels rather than files. (Probably they were thinking of FCB create rather than open, actually.) In the Find First FCB description there is no corresponding note, but it is possible that it does not work on a FAT32 FS on MS-DOS 7.10+ either. (I quickly tested it, and it appears that FCB Open succeeds on the FAT32 FS image on EDR-DOS. Did not test actually reading or writing the file.)

Other than the attribute bug (which I assume indicates a bug in the FreeDOS kernel with the EDR-DOS kernel's choice being correct), there appear to be two bugs in the FreeDOS kernel:

There are two UWORD fields in the FCB structure's reserved parts that are called fcb_strtclst and fcb_dirclst: https://github.com/FDOS/kernel/blob/29ccb6e45414a8a18346a97fdb0f35ebd6fb4025/hdr/fcb.h#L95

Searching for "clst" in the file fcbfn.s reveals that these two fields are only used by the FCB Find functions, not any usual FCB file operations it appears. (The normal FCB file accesses depend entirely on the stored SFT handle number to point to the same, still valid FCB SFT entry.) Both fields are written once and read once: https://github.com/FDOS/kernel/blob/29ccb6e45414a8a18346a97fdb0f35ebd6fb4025/kernel/fcbfns.c

Here, in the early part of FcbFindFirstNext:

    Dmatch.dm_entry = lpFcb->fcb_strtclst;
    Dmatch.dm_dircluster = lpFcb->fcb_dirclst;

And here, a bit further down:

  lpFcb->fcb_dirclst = (UWORD) Dmatch.dm_dircluster;
  lpFcb->fcb_strtclst = Dmatch.dm_entry;

There is one problem immediately apparent here, though the third bug actually completely masks this (second) bug: The dm_dircluster field of the DTA (dmatch) structure is a 32-bit dword for a FAT32-enabled kernel build. But it is truncated to, respectively zero-extended from, a 16-bit word in these accesses. I will study EDR-DOS's approach at a later point but cursory tests seem to indicate it stores the complete 32-bit directory cluster number in the search FCB's structure.

The third bug is in this line: https://github.com/FDOS/kernel/blob/29ccb6e45414a8a18346a97fdb0f35ebd6fb4025/kernel/fcbfns.c#L658

It reads if (First). This condition is reversed from what it should be. The (static) dmatch buffer used by the kernel should be updated from the search FCB if this call is a Find Next. The way this condition is written, it will "update" the dmatch fields from the (uninitialised) search FCB only before the DosFindFirst call, which I believe has no effect whatsoever. Before a DosFindNext call, the dmatch structure is never re-initialised from the search FCB.

This did not cause readily apparent bugs because the kernel's static dmatch structure used by the FCB find functions will still hold the prior call's data when an FCB Find Next call happens. However, as in my test with the interleaved concurrent searches, this is not what was intended and actually does constitute a bug whenever more than one search is in progress in an application.

I will prepare patches to these three bugs at a later time. The third bug probably also requires to re-order the wAttr uses a bit.

ecm-pushbx commented 1 year ago

EDR-DOS does not use the fields named fcb_dirclst and fcb_startclst for its search data. In fact, the attempt at restoring the DTA that is prevented from working by the third bug is innately wrong because the DosFindNext call may call down into the redirector and we can't know exactly which of the reserved fields in the search data are used by the redirector.

EDR-DOS simply copies most of the reserved fields of the DTA into the search FCB. This presumably enables concurrent FCB searches even on redirected drives though I did not test this yet.

This is in the function fcb_save_search_state, the entirety of populating the search FCB from the DOS v2 style Find data DTA:

    lea di,MSF_NAME[bx]     ; ES:DI -> FCB name
    mov si,offset fcb_search_buf
    lodsb               ; get 1st byte = drive info
    mov cx,20/WORD      ; copy 20 bytes to FCB
    rep movsw           ; (the rest of the search template)
    stosb               ; drive info byte follow them

(The remainder of this function populates the user DTA with the result data.)

The reverse is in the function fcb_restore_search_state which is called before the equivalent of the DosFindNext call:

    mov di,offset fcb_search_buf+1
                    ; ES:DI -> internal state
    lea si,1[bx]        ; DS:SI -> FCB+1
    mov cx,10
    rep movsw           ; copy info from FCB
    lodsb               ; get "drive" info
    mov es:fcb_search_buf,al    ; it's the 1st byte in the srch state
ecm-pushbx commented 1 year ago

Another bit I picked up from EDR-DOS's fcbs.a86 is that if an open FCB no longer references a valid FCB SFT entry, then the DOS attempts to re-open the FCB, preserving some current record fields in the FCB. This assumes that the relevant default drive (if used) and cwd did not change since the FCB had originally been opened. (The cwd is unlikely to be changed by an application using FCBs. The default drive could be changed though. Tough.)

ecm-pushbx commented 1 year ago

The search attributes appear to be set to zero if not overridden by an extended FCB as the search FCB, in EDR-DOS.