copy / v86

x86 PC emulator and x86-to-wasm JIT, running in the browser
https://copy.sh/v86/
BSD 2-Clause "Simplified" License
19.62k stars 1.37k forks source link

Support for other code pages in text terminal? #1098

Open evonox opened 1 month ago

evonox commented 1 month ago

I need to present the information in the text terminal in other language than English. Is there any support for the text terminal to display letters with diacritics? If not, I am going to implement it ASAP. I miss this feature a lot.

SuperMaxusa commented 1 month ago

Can you give an example string with these diacritic symbols? Here is a similar issue with xterm.js: https://github.com/copy/v86/issues/927

chschnell commented 1 month ago

I think the issue is that DisplayAdapter is not informed about the Codepage being used by the OS.

ScreenAdapter.put_char() is called with the character's byte code, so without knowing the active code page it's impossible to know which Unicode character to map to.

I have a hacked ScreenAdapter that lets me override the fixed codepage CP437 with CP850, it works for me however it wouldn't in general. I think it's some INT 21h that sets the code page, I don't know enought about it but I figure this would be the place where a bus event should be generated for ScreenAdapter to react upon.

EDIT: CP437 is hardcoded into ScreenAdapter here.

Pixelsuft commented 1 month ago

I think it's due to how v86 renders text.

Also, if someone needs, there are json files with charmap_high arrays for each encoding.

chschnell commented 1 month ago

I think it's due to how v86 renders text.

Also, if someone needs, there are json files with charmap_high arrays for each encoding.

Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks!

Just two quick questions (I believe you're the author):

First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid?

Second, five code page mappings have less than 128 items:

Are the missing items just 1:1 mappings to CP437?

evonox commented 1 month ago

Hello, sorry for the late answer. Exactly I was playing around with the hard coded code pages, substituting them with CP1250 and ISO-8859-2. But I have not been successful. It also depends on the configuration of the terminal of the embedded OS.

evonox commented 1 month ago

I will provide the Bash Script for generating all the codepage maps supported by Linux, if requested. But it is out of my expertize of what all needs to be configured in Linux OS to make it work. My distribution is ArchLinux32. I welcome any ideas for help.

evonox commented 1 month ago

I provide the script to generate the encodings using Bash in Linux.

Usage:

Usage:
  generator.sh [encoding_name] - generate JSON output for given encoding
  generator.sh STDIN           - read encodings to generate from standard input
  generator.sh                 - generate JSON output for all encodings - takes 45-60 min

  generator.sh --help          - show this help

Generator file is included as a attachment.

#!/bin/bash

###################################
# GENERATE ALL CODE PAGES TO JSON #
###################################

# Turn Exit of Error ON
set -e

# Output character using its code number
chr() {
   printf "\\$(printf '%03o' "$1")"
}

# Convert the number with given codepage to Unicode Hex Code
toUnicode() {
    set -o pipefail

    CODE=$(chr "$1" | iconv --from-code="$2" --to-code=UTF16LE 2>/dev/null | hexdump | sed "2d" | awk '{ print "0x" $2 }')
    if [ $? -ne 0 ] ; then
        # Error - the given character is not possibly supported by the given encoding
        echo -n "\"N/A\""
    else
        # Convert HEX code to DEC - JSON does not support numbers in HEX
        printf "%d" $CODE
    fi

    set +o pipefail
}

# Check if the name of encoding is correct
isEncodingValid() {
    local ENCODINGS=$(iconv -l | sed 's/\/\///')
    local ENC=""
    for ENC in $ENCODINGS ; do
        if [ "$ENC" = "$1" ] ; then
            # Encoding found, exit with success
            return 0
        fi
    done
    # Invalid encoding - Exit with Error
    return 1
}

# First check the first argument if any encoding is given
if [ ! -z $1  ] ; then
    if [ "$1" = "--help" ] ; then
        echo "Usage:"
        echo "  generator.sh [encoding_name] - generate JSON output for given encoding  "
        echo "  generator.sh STDIN           - read encodings to generate from standard input"
        echo "  generator.sh                 - generate JSON output for all encodings - takes 45-60 min"
        echo
        echo "  generator.sh --help          - show this help"
        echo
        exit
    fi
    # if STDIN argument is provided, read the required encodings to generate from Standard Input
    if [ "$1" = "STDIN"  ] ; then
        ENCODINGS=""
        set +e
        while read ENC ; do
            if [[ $ENC =~ ^\s*$ ]] ; then # Ignore empty lines
                continue
            fi

            isEncodingValid $ENC
            if [ $? -ne 0 ] ; then
                 echo "Invalid encoding name $ENC" >>/dev/stderr
                 exit 1
            fi
            ENCODINGS="$ENCODINGS $ENC"
        done
        set -e
    else
        set +e
        isEncodingValid $1
        if [ $? -ne 0 ] ; then
            echo "Invalid encoding name $1" >>/dev/stderr
            exit 1
        fi
        set -e

        ENCODINGS="$1"
    fi
else
    # Query all encodings supported by Linux
    ENCODINGS=$(iconv -l | sed 's/\/\///')
fi

# Begin JSON object
echo "{"

# Iterate all encodings
FIRST_ITEM=1
for ENCODING in $ENCODINGS ; do
    # If NOT the first item, write Comma Separator
    if [ $FIRST_ITEM -eq 1 ] ; then
        FIRST_ITEM=0
    else
        echo ","
    fi

    # Write the Encoding as the JSON object key
    echo -n "    \""
    echo -n $ENCODING
    echo -n "\""

    # Begin CodePage Array
    echo -n ": ["

    # Iterate all characters
    FIRST_CODE=1
    for CODE in {0..255} ; do
        if [ $FIRST_CODE -eq 1 ] ; then FIRST_CODE=0 ; else echo -n "," ; fi

        UNICODE_CODE=$(toUnicode "$CODE" "$ENCODING")
        echo -n $UNICODE_CODE
    done

    # End CodePage Array
    echo -n "]"
done

# End JSON object
echo
echo "}"
Pixelsuft commented 1 month ago

I think it's due to how v86 renders text. Also, if someone needs, there are json files with charmap_high arrays for each encoding.

Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks!

Just two quick questions (I believe you're the author):

First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid?

Second, five code page mappings have less than 128 items:

  • cp932.json (101 items)
  • cp936.json (71)
  • cp949.json (67)
  • cp950.json (83)
  • cp1361.json (76)

Are the missing items just 1:1 mappings to CP437?

I generated those JSON files a long time ago with python script and tested only some encodings that worked, so IDK

evonox commented 1 month ago

I think it's due to how v86 renders text. Also, if someone needs, there are json files with charmap_high arrays for each encoding.

Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks! Just two quick questions (I believe you're the author): First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid? Second, five code page mappings have less than 128 items:

  • cp932.json (101 items)
  • cp936.json (71)
  • cp949.json (67)
  • cp950.json (83)
  • cp1361.json (76)

Are the missing items just 1:1 mappings to CP437?

I generated those JSON files a long time ago with python script and tested only some encodings that worked, so IDK

Try to validate it against my Bash script. It is using iconv tool so it should be valid. I do not say there are issues where it fails to convert a character for some reason. The value given then is a string "N/A" for such ASCII code.

evonox commented 1 month ago

@SuperMaxusa : Is there any possibility for the CopySH emulator to support apart from given codepages the full UTF-8 support?

chschnell commented 1 month ago

Code point definitions for many PC Code Pages can be found at www.unicode.org:

Specifically:

Special care needs to be taken for several graphical characters that represent non-printable symbols, 0x01..0x1f and 0x7f (ESC) amongst them. Their mappings are defined here:

I would recommend to use these.

evonox commented 1 month ago

Code point definitions for many PC Code Pages can be found at www.unicode.org:

Specifically:

Special care needs to be taken for several graphical characters that represent non-printable symbols, 0x01..0x1f and 0x7f (ESC) amongst them. Their mappings are defined here:

I would recommend to use these.

Thank you for info. I did not know about this. I could "wget" it and parse using AWK to JSON.

SuperMaxusa commented 1 month ago

Is there any possibility for the CopySH emulator to support apart from given codepages the full UTF-8 support?

If you about VGA Text Mode, probably no, because EGA and VGA are limited by CP437 support as @chschnell noticed:

EDIT: CP437 is hardcoded into ScreenAdapter here.

IIRC, in display drivers for MS-DOS (like a display.sys) loads custom font into VGA RAM but I'm not sure that works on v86: https://github.com/microsoft/MS-DOS/blob/2d04cacc5322951f187bb17e017c12920ac8ebe2/v4.0/src/DEV/DISPLAY/INT10COM.INC#L3-L35, https://wiki.osdev.org/VGA_Fonts#Set_VGA_fonts

Thank you for info. I did not know about this. I could "wget" it and parse using AWK to JSON.

In dosbox-x repository, you can find a tool to convert these files to unicode arrays: https://github.com/joncampbell123/dosbox-x/blob/master/contrib/mappings/db2u.pl

Pixelsuft commented 1 month ago

IIRC, in display drivers for MS-DOS (like a display.sys) loads custom font into VGA RAM but I'm not sure that works on v86: https://github.com/microsoft/MS-DOS/blob/2d04cacc5322951f187bb17e017c12920ac8ebe2/v4.0/src/DEV/DISPLAY/INT10COM.INC#L3-L35, https://wiki.osdev.org/VGA_Fonts#Set_VGA_fonts

Unlike other emulators which draw characters pixel by pixel, v86 just renders text mode using HTML, so it will propably be difficult to automaticly detect encoding from VGA RAM fonts.

evonox commented 1 month ago

@Pixelsuft Yes, I have noticed that it is not possible to support UTF-8 directly by reading the source code of vga.js. I also made a big confusion around the OS I use. I am not using MS-DOS but ArchLinux32. I need to get familiar of how to configure the terminal and console settings and how to remap the ASCII codes to Unicode in the ScreenAdapter. I will share the results immediately when I will be successful with this issue. Possibly I will prepare a PR.

chschnell commented 1 month ago

Attached a ZIP containing the Codepage-to-Codepoint mappings in Javascript, along with the Python script I wrote to create it from the raw files from www.unicode.org.

Supported codepages:

CP437, CP737, CP775, CP850, CP852, CP855, CP857, CP860,
CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP874,
CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256,
CP1257, CP1258

Would it be possible to generate a bus event for DisplayAdapter whenever the OS changes its system codepage? I'd be willing to implement it, but I'm not sure where to start. codepage_converter.zip

copy commented 1 month ago

Unlike other emulators which draw characters pixel by pixel, v86 just renders text mode using HTML, so it will propably be difficult to automaticly detect encoding from VGA RAM fonts.

Drawing vga fonts on a canvas would be fairly simply, but I like the fact that you can copy-paste from the vga screen.

Would it be possible to generate a bus event for DisplayAdapter whenever the OS changes its system codepage? I'd be willing to implement it, but I'm not sure where to start.

I'd accept a PR, but I don't know how the OS communicates to the vga controller which code page to use (and if it does at all). Alternatively, I'd also accept a PR that (optionally) renders the vga screen on a canvas, or sets the code page manually.

chschnell commented 1 month ago

Well, I guess this is a good time then to present a little experiment I made which implements text screen on a canvas.

First, here's a test page without a running V86 instance in the background: Demo 1.

On the top left, first click on "Start", play around with the settings, then click on "Demo", test "Fullscreen". This demo is designed to cause permanent repaints, everything is drawn pixel-by-pixel in an AnimationFrame-loop ~60 times per second. In earlier tests I measured ~1.25ms (average of 100 runs) for a full screen 80x25 repaint on Firefox, and ~0.7ms under Chrome, though it is not easy currently to measure properly.

Text rendering is implemented in class TextCanvas, which is designed tightly around V86's ScreenAdapter, so here's Demo 2 with a running V86 instance in the back using a custom TextCanvas-ScreenAdapter.

Click on Machine -> Boot in the menu, wait ~15 seconds for the image to download befor it starts to boot. It's a 20M FreeDOS with Monkey Island for testing. You can also upload your own image, stop the Machine and select Harddisk -> Import from the menu (the V86 instance has 256M RAM an 16M VGA RAM). Run Monkey Island to see that Text and Graphics mode share the same DOM canvas without interfering (press CTRL+Q to exit Monkey Island).

However, this is fundamentally misdesigned, though I do think it contains some important building blocks for this task.

Conceptually, this should be moved into the VGA emulator, ScreenAdapter is obviously the wrong place.

I've read a bit into it, but it is still a bit of a mystery to me how it is supposed to work, and how OS and VGA card play together here, and if the BIOS is involved. "Code Page" is a high-level concept, what I wrote earlier about INT 21h is merely an OS-specific DOS-concept and as such misplaced here.

From what I understand so far, the VGA card provides several 8-bit font banks, and the OS may upload fonts into these banks. I think this is where "Code Pages" happen, and the VGA card never needs to know about the details beyond the bitmaps. There are separate fonts for 25 and for 50 text rows (16 and 8 scanlines height, respectively). The VGA card knows everything required to implement text mode, but I still have to do a lot of learning to do here, any pointers would be greatly appreciated.

A few details on the fonts, I've used these two:

Text is drawn onto the screen pixel-by-pixel without using canvas's strokeText() or fillText() methods. For that I converted these fonts into bitmaps, character set being the union of unicode codepoints of all 8-bit code pages I am using. TextCanvas selects the active subset of 256 glyphs based on the active code page. So I only need a single font bitmap file to cover all code pages.

Pixelsuft commented 1 month ago

I think it's better to just get VGA RAM fonts working somehow

chschnell commented 1 month ago

I tinkered around with BIOS int 10h, which interfaces the graphics subsystem to switch screen modes, upload fonts etc.

I wrote two little C programs, one uses int 10h to interface the very old VGA subsystem, and the other uses the VESA BIOS extension. It's been 30 years since I wrote software in real mode, so that was quite some fun :)

If you want to follow along, here's the image containing FreeDOS, my sources and the C toolchain (image is configured in German, but that shouldn't matter): FreeDOS-256m-de.zip (87M).

After booting, enter:

cd gfxtest
nmake

Now you have two executables, VGATEST.EXE and VESATEST.EXE, run

VESATEST.EXE -b

To get this list of supported graphics modes (that's vgabios.bin answering here):

[ 1] 0x0100: 640x400 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 2] 0x0101: 640x480 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 4] 0x0103: 800x600 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 6] 0x0105: 1024x768 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 8] 0x0107: 1280x1024 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 9] 0x010d: 320x200 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[10] 0x010e: 320x200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[11] 0x010f: 320x200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[12] 0x0110: 640x480 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[13] 0x0111: 640x480 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[14] 0x0112: 640x480 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[15] 0x0113: 800x600 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[16] 0x0114: 800x600 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[17] 0x0115: 800x600 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[18] 0x0116: 1024x768 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[19] 0x0117: 1024x768 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[20] 0x0118: 1024x768 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[21] 0x0119: 1280x1024 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[22] 0x011a: 1280x1024 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[23] 0x011b: 1280x1024 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[24] 0x011c: 1600x1200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[25] 0x011d: 1600x1200 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[26] 0x011e: 1600x1200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[27] 0x011f: 1600x1200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[28] 0x0140: 320x200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[29] 0x0141: 640x400 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[30] 0x0142: 640x480 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[31] 0x0143: 800x600 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[32] 0x0144: 1024x768 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[33] 0x0145: 1280x1024 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[34] 0x0146: 320x200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[35] 0x0147: 1600x1200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[36] 0x0148: 1152x864 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[37] 0x0149: 1152x864 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[38] 0x014a: 1152x864 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[39] 0x014b: 1152x864 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[40] 0x014c: 1152x864 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[41] 0x0175: 1280x768 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[42] 0x0176: 1280x768 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[43] 0x0177: 1280x768 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[44] 0x0178: 1280x800 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[45] 0x0179: 1280x800 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[46] 0x017a: 1280x800 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[47] 0x017b: 1280x960 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[48] 0x017c: 1280x960 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[49] 0x017d: 1280x960 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[50] 0x017e: 1440x900 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[51] 0x017f: 1440x900 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[52] 0x0180: 1440x900 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[53] 0x0181: 1400x1050 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[54] 0x0182: 1400x1050 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[55] 0x0183: 1400x1050 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[56] 0x0184: 1680x1050 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[57] 0x0185: 1680x1050 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[58] 0x0186: 1680x1050 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[59] 0x0187: 1920x1200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[60] 0x0188: 1920x1200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[61] 0x0189: 1920x1200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[62] 0x018a: 2560x1600 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[63] 0x018b: 2560x1600 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[64] 0x018c: 2560x1600 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[65] 0x018d: 1280x720 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[66] 0x018e: 1280x720 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[67] 0x018f: 1280x720 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[68] 0x0190: 1920x1080 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[69] 0x0191: 1920x1080 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[70] 0x0192: 1920x1080 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[71] 0x0193: 1600x900 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[72] 0x0194: 1600x900 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[73] 0x0195: 1600x900 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[74] 0x0196: 2560x1440 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[75] 0x0197: 2560x1440 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[76] 0x0198: 2560x1440 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[91] 0x0013: 320x200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0

Then, for example, switch to 800x600 32bpp using:

VESATEST.EXE 0x0143

You've switched into a graphics mode, but you can still see the text console and interact with it. That's a graphical font built into vgabios.bin, I believe (there's a chance that FreeDOS uploaded this font into the graphics card at boot time, I don't know yet).

I guess that is how it's supposed to work, and it looks ~broken~ [EDIT: ... the same as under 86Box emulator, so I guess it works as expected]. You can see that if you enter the HELP command when in any of these graphics modes (exit HELP by pressing ESC twice, it works even if you can't see it; restore 80x25 Text mode with VESATEST.EXE 3). ~So far I have no clue what's broken here.~

I believe the text-fonts are located in the graphics card ROM, whereas the graphics fonts can be replaced - can anyone confirm this?

chschnell commented 1 month ago

The VGA font table is actually used in text mode only, the OS can freely upload sets of 256 glyphs as it whishes.

On the other hand, in VGA text mode plane 0 holds the character codes, plane 1 the attributes, and plane 2 the glyph bitmaps (see "Memory Layout in text modes")!

Now it makes sense.

If in VGA text mode text was drawn to the canvas using the glyph bitmaps from plane 2 then it should all work as expected.

Selecting text and copying it to clipboard cannot work out-of-the box because the (OS-dependent) code page information needed to map the 8-bit character codes to their respective unicode code points is not generally available (I couldn't find anything in this respect).

I will change my demo to simply download the font from plane 2 using int 10h and use that instead of my scanned bitmaps. It should really be implemented in vga.js.

EDIT: Currently, vga_memory_write() in vga.js discards font data that it receives, and VGAScreen.vga_memory_write_text_mode() would need to be replaced entirely. Cursor emulation needs be considered, too. Since several bus events (like "screen-put-char") would need to be dropped this is a breaking change. Anything else to consider?

SuperMaxusa commented 1 month ago

You've switched into a graphics mode, but you can still see the text console and interact with it. That's a graphical font built into vgabios.bin, I believe (there's a chance that FreeDOS uploaded this font into the graphics card at boot time, I don't know yet).

If I am not mistaken, with teletype int 10h, ah=13h you can write text in graphics mode (for example, https://copy.sh/v86?profile=hello-v86 uses this) and without using any custom preloaded fonts except standard VGA fonts.

On the other hand, in VGA text mode plane 0 holds the character codes, plane 1 the attributes, and plane 2 the glyph bitmaps (see "Memory Layout in text modes")!

I think it is unlikely the VGA fonts and glyphs stays in planes when you set graphical mode, I have read some code in Bochs VGA Bios, and the fonts seem to be taken from other place and draw pixel by pixel in the framebuffer on writing the character:

chschnell commented 1 month ago

Thank you for your input, great links!

You're right about the teletype int 10h, I hadn't come around to try it out yet.

Regarding fonts in plane 2, in biosfn_set_video_mode(mode) they load the ROM's 8x16-font using int 10h when switching into a text mode (not the OS's custom one), see line 1018. Maybe the OS is expected to reupload its font after a mode change?

SuperMaxusa commented 1 month ago

Maybe the OS is expected to reupload its font after a mode change?

Makes sense, I have found option SCREEN for FreeDOS and a few words catched my attention:

Some newer graphics cards may not have 8x14 fonts in the BIOS. In that case, a driver can be loaded to load a suitable font in RAM, but SCREEN=0x11 should not be used.

I guess it's a driver like a DISPLAY.SYS that I mentioned earlier?

chschnell commented 1 month ago

Thanks again! I agree that DISPLAY.SYS is likely one driver that should do.

Regarding the text fonts: Before writing a font to VGA memory, the Sequencer Data Register's "Memory Plane Write Enable" byte (0x3C5, index 0x02) is set to 0x04 (write to plane 3), and after finishing it's set back to 0x03 (write to plane 1+2). Using this I can log in VGAScreen.port3C5_write() when the OS begins and ends writing to the font bitmaps. I also patched VGAScreen.vga_memory_write() to not discard the font bitmap data when plane 3 is being written to and routed it into buffer this.plane3[] to not trash the text screen (which it otherwise does, so there's data coming in).

Next I tested 4 different OSes to see how they modify the VGA text font at boot time and later.

The first write always occures right at the start of booting (when the BIOS presents its boot menu), this must be the CP437 font from the BIOS, the second one comes a bit later and must be the switch to CP850 in FDAUTO.BAT/AUTOEXEC.BAT. I am not sure yet what the third write access from FreeDOS13/de is, but this looks really promising.

I then launched a game under FreeDOS13/de to switch to graphics mode, exited back to text mode, and indeed, the font buffers are being written to again twice after leaving the game, the first should be CP437 and the second CP850.

chschnell commented 1 month ago

I think adding an alternate text mode to VGAScreen is a way to integrate this new feature as a non-breaking change. Consider a new "graphical text mode" for VGAScreen, if active then VGAScreen...

VGAScreen registers for some new bus event to allow this alternate text mode to be enabled/disabled, and by default it's disabled.

Would this be ok?

SuperMaxusa commented 1 month ago

Thanks for your notes, they are very helpful for me!

I tried to make a small demo that gives a view of the font from Plane 2 (first you need recompile libv86.js with included patch): https://gist.github.com/SuperMaxusa/7c8329c3f9e41db5114d57046870de03 It updates canvas every 1 second, but I think a better solution is to register event like a "vga-font-plane-write" then call displayFont() with it.

Also you can grab freedos13.img for testing here: freedos13.tar.gz


By default FreeDOS' display driver is not loaded on startup, so we get a CP437 8x16 glyphs:

screenshot ![fdos-cp437](https://github.com/user-attachments/assets/ed86c37e-c10b-435a-9279-99417a2f370e)

You can load a CP850 charset manually with these commands:

lh A:\FDOS\BIN\DISPLAY.EXE CON=(EGA,850,1)
A:\FDOS\BIN\MODE CON CP PREP=((850) A:\CPI\EGA.CPX)
A:\FDOS\BIN\MODE CON CP SEL=850
A:\FDOS\BIN\MODE CON CP REFRESH
A:\FDOS\BIN\MODE CON CP /STATUS
screenshot ![cp850](https://github.com/user-attachments/assets/58da0efa-f492-4c90-8a6f-66452369e1ee)

You can try to load FNT font using gnuchcp:

gnuchcp.exe A:\gnufonts\<name>.fnt

(for resetting use gnuchcp.exe -r)

screenshot ![gnuchcp](https://github.com/user-attachments/assets/f38a1e8e-eb51-49b7-9528-e895b355ce82)

When Fontraption (command: A:\FRAPT\FRAPT.COM) starts, it changes some glyphs for the interface (but works buggy and the glyphs don't change in preview):

screenshot + comparing changes ![frapt](https://github.com/user-attachments/assets/c5e5e117-6ae2-41e0-8ce4-76c7b1cf207c) ![dosbox](https://github.com/user-attachments/assets/c64026b7-e488-4e1c-aee3-3cf377bfed8d) ![comp](https://github.com/user-attachments/assets/02d703c8-d385-4cf8-864e-b7b503d343e8)

And some tests of changing VGA modes:

When Magiduck game (command: cd A:\MAGI and DUCK.EXE) is started, it switches to 40x25 text mode with 8x8 glyphs, and on font canvas it looks some glitched because 8x8 font overwrites previous font and when needed to get char, it's cuts by scan line: http://www.osdever.net/FreeVGA/vga/char.txt. For now I don't have idea, how to get this maximum scan line from hardware side, like how int 10h, ah=1130h does it.

screenshot ![magiduck](https://github.com/user-attachments/assets/a057b1df-94ad-40fe-8562-0b8f6b20713f)

When I switch to graphics mode like via this asm code:

mov ah, 0
mov al, 13h
int 10h

...graphics mode also overwrites bitplanes, along with the font bitplane.

screenshot ![mode13h](https://github.com/user-attachments/assets/1ee47894-9aba-4ccd-95a4-d134fb45b71f)
chschnell commented 1 month ago

You're very welcome! In fact I was a bit worried if my notes were too much. :)

Really impressed by the clever demo you made there, love to see those font dumps as I know what to look for (that surely is a CP850)! I wasn't aware that you are also working on this, great!

I installed your demo, reduced the sleep from 1000 to 50ms, and booted it up with my FreeDOS/de HDA (that image is in German, so FreeDOS sets CP850 in FDAUTO.BAT). It behaves exactly as expected. I also checked out gnuchcp and frapt from your floppy, very usefull tools which come in really handy.

Thanks to your demo I think the basic concept is clear now.

More notes:

I've begun a deep dive into the text-mode related VGA registers, and I'm surprised about the variety of possible configurations, shows just how important text mode was back in the days.

An unsorted list of things that should be supported/considered (in my opinion):

A problem I see is that the state of the "text rendering machine" is scattered over about a dozen different VGA register fields that can each be changed individually at any time, there's no thing like a "transaction" that would tell us when the text rendering state has transitioned from one consistent state to another. But, rendering will be clocked by the browser's requestAnimationFrame() loop, it can hit in the middle of the OS changing VGA registers. This might cause unpredictable flicker during VGA state transitions.

I think rendering could be simplified if the VGA's raw font bitmaps in plane 2 were transformed into a simple, flat array of booleans (with no gaps between glyphs, simply 256 font_width font_height) whenever the font's size or shape has changed. The implicit 9th column and LGA (both affect font shape) could be incorporated into that simple array to keep this stuff out of the rendering loop.

SuperMaxusa commented 1 month ago

This might cause unpredictable flicker during VGA state transitions.

You mean like the race condition between canvas and frame updates?

By the way, I have noticed that in TextCanvas.render() uses performance.now() for blinking effect, how about using the frame counter[^1] (as done in PCjs) or requestAnimationFrame() callback's timestamp for this?

[^1]: PCjs "blinks" in text mode every 10 frames per second (about 170 ms per second) which is close to real hardware. According to http://www.osdever.net/FreeVGA/vga/textcur.htm#blink, the blink rate for VGA is 16 frames per second (about 260 ms per second).

Pixelsuft commented 1 month ago
  • font width of 8 or 9 pixels: any font height is ok (up to 32), but font width 9 has special rules

Also can be 16 (2x width scale) (but not 18). For example, Tetris in MS-DOS profile.

chschnell commented 1 month ago

I have a first implementation running, see here for a demo. Select Machine -> Boot from the menu, mind the image download time of around 15 seconds before it starts booting.

CP850 works properly: freedos13-cp850

Fontraption works properly: fontraption

Gnuchcp works properly, example with font COMPUTER: gnufont-computer

What's still missing is:

This is a non-breaking change, meaning the new "graphical text mode" needs to be activated explicitly or else everything behaves as before.

It uses a lazy updating approach. Since you cannot see the changes between two consecutive redraws, these changes are delayed and only applied at the next invocation of redraw. This could reduce flicker during VGA state transitions.

It works with relatively little extra state, for example there is no extra text buffer involved, it renders directly from the 16 bit words in VGA memory. When appropriate, GraphicalTextScreen.render() reads the text rendering configuration directly from the VGA registers. Housekeeping code in general is quite reduced compared to the current text mode implementation.

In order to test it yourself will need to patch starter.js, cpu.js and screen.js. You will also need to include vga.js and graphical_text_screen.js. Look into my index.html to see how to use it in the V86 constructor. Patches: cpu.patch, starter.patch and screen.patch.

Here's the source code of vga.js and graphical_text_screen.js, all changes in vga.js have a comment starting with "///".

A few words on the structure: graphical_text_screen.js exports class GraphicalTextScreen, which is instantiated in VGAScreen's constructor. GraphicalTextScreen's main purpose is to provide a render() method that returns an ImageData object with the full screen content of the text screen. At a few selected places VGAScreen calls methods of GraphicalTextScreen, and the latter reads registers and memory from the former. The two classes are tightly linked and could be merged, but that would bloat VGAScreen so maybe it's better to keep them separate.

I will make a proper PR in the next days if this is acceptable.

Tested with Firefox and Chrome.

@SuperMaxusa: I based the blink rate on a frame-counter, I didn't think of that before you pointed it out and it's much better of course. I didn't like performance.now() anyway yet it was more precise than requestAnimationFrame() callback's timestamp argument.

@Pixelsuft: Which Tetris do you mean, can you provide me with a link?

SuperMaxusa commented 1 month ago

Nice demo :)

anyway yet it was more precise than requestAnimationFrame() callback's timestamp argument.

Indeed, MDN said "The timestamp value is also similar to calling performance.now() at the start of the callback function, but it is never the same value" and the minimal precision value is 1 ms. Also high-resolution timers can be less precisely for security reasons: https://w3c.github.io/hr-time/#sec-security

Which Tetris do you mean, can you provide me with a link?

Probably GAMES\TETRIS.COM from https://copy.sh/v86?profile=freedos and https://copy.sh/v86?profile=msdos.

Pixelsuft commented 1 month ago

Probably GAMES\TETRIS.COM from https://copy.sh/v86?profile=freedos and https://copy.sh/v86?profile=msdos.

Currently tetris looks like this: image But should: image

I hope this will be useful:

VGAScreen.prototype.port3C5_write = function(value)
{
    switch(this.sequencer_index)
    {
        case 0x01:
            dbg_log("clocking mode: " + h(value), LOG_VGA);
            var previous_clocking_mode = this.clocking_mode;
            this.clocking_mode = value;
            if((previous_clocking_mode ^ value) & 0x20)
            {
                // Screen disable bit modified
                this.update_layers();
            }
            if(((previous_clocking_mode ^ value) & 0x08) || ((previous_clocking_mode ^ value) & 0x01))
            {
                // (2x scale bit) or (8 or 9 px char width bit) modified
                // if 2x scale is enabled, then we should use 8px*2 char width, otherwise 9px or 8px
                const new_char_width = (value & 0x08) ? 16 : (9 ^ (value & 0x01));
                console.log('TODO: new char width', new_char_width);
            }
            break;
        case 0x02:
            dbg_log("plane write mask: " + h(value), LOG_VGA);
            this.plane_write_bm = value;
            break;
        case 0x04:
            dbg_log("sequencer memory mode: " + h(value), LOG_VGA);
            this.sequencer_memory_mode = value;
            break;
        default:
            dbg_log("3C5 / sequencer write " + h(this.sequencer_index) + ": " + h(value), LOG_VGA);
    }
};
chschnell commented 1 month ago

It's beginning to take shape, integration with VGAScreen and configuration (through the user-facing V86 constructor) looks complete.

Current patchset (small) for starter.js, cpu.js, vga.js and screen.js: v86-graphical-text.patch, plus vga.js and vga_txt.js (renamed from graphical_text_screen.js).

@Pixelsuft: Thank you very much for this helpful "bit" (0x8) of information! I've integrated double-sized width in my demo, tetris now looks as it should (as do the the lower VGA video modes 0, 1, 4, 5 and I guess 6).

chschnell commented 1 month ago

The new class is now compliant with the V86 build system, and so I was able to put it all together into a single patch file. Apply the all-in-one patch v86-graphical-text.patch to a fresh clone of the V86 master before running make all. Patched files:

Usage: The V86 constructor supports a new option screen_options that is passed to the constructors of VGAScreen and ScreenAdapter, if defined. Currently there are 2 screen-related settings, both are optional:

Example configuration that enables graphical text mode and disables auto-scaling:

window.emulator = new V86({
    wasm_path: "v86.wasm",
    memory_size: 16 * 1024 * 1024,
    vga_memory_size: 2 * 1024 * 1024,
    screen_container: document.getElementById("screen_container"),
    screen_options: {
        use_graphical_text: true,
        disable_autoscale:  true
    },
    bios: { 
        url: "seabios.bin" 
    },
    vga_bios: { 
        url: "vgabios.bin" 
    },
    fda: { 
        url: "freedos13.img" 
    },
    autostart: true, 
});

The auto-scaling from ScreenAdapter can compound and thus glitch. It can be replaced with css, for example with min-width: 640px; height: auto; on the canvas.

chschnell commented 1 month ago

To improve testing I've created a clone of copy's web site, but with my graphical text patch enabled (I only added the screen_options to main.js' call to the V86 constructor), see here.

If you look cloesely you'll notice that all systems are always reported to be in graphics mode, even when actually in text. That's a side-effect of my patch.

I've cloned all images offered in non-debug mode, all boot up and look ok except for the two BSDs, freebsd and openbsd, no text is displayed yet some background colors. So far I have no idea why that is.

In the web site's Debug mode I get an error in the browser console that my new class GraphicalText (in new module src/vga_text.js) cannot be found. Apparently I need to make my class visible for Debug mode when building, and I don't know how, does anybody know?

SuperMaxusa commented 1 month ago

Apparently I need to make my class visible for Debug mode when building, and I don't know how, does anybody know?

Have you tried adding your module here?

https://github.com/copy/v86/blob/750fedf5be4081195d57b8e7efc09c25ec53681c/debug.html#L16

chschnell commented 1 month ago

That was it, though it's CORE_FILES instead of BROWSER_FILES in my case, but that doesn't matter of course.

Debug mode works now at my cloned site.

Thanks again @SuperMaxusa!

chschnell commented 1 month ago

I found the bug that caused text mode 80x50 in FreeDOS to fail.

The command in MS-DOS and FreeDOS to switch to 80x50 is mode con lines=50, it works fine in MS-DOS but not under FreeDOS.

If you execute the mode command under FreeDOS, it will switch only half-way, you'll have 50 lines at your disposal, but you only get to see the lower 25 of them. If you then switch back to 80x25 it's reversed, you'll get to see 50 rows but only the upper 25 are actually used.

This looked suspicious, a bit like a off-by-one error, so I looked.

All a bit simplified, text mode with 80 columns and 50 rows depends on the value of two indices of one VGA port:

Width and height of the text screen are calculated in VGAScreen.update_vga_size(), and height is basically just vertical_display_enable_end divided by max_scan_line (meaning: scan-lines / font-height). So whenever either of these register's values change this function must be called.

It was the case for vertical_display_enable_end, but it was missing for max_scan_line, so I added it, and now it works.

I've added a patch to my PR and patched my demo.

chschnell commented 3 weeks ago

You may have noticed the garbage in Fontraption's left tab in my earlier post:

ftrap-broken

I found and fixed the cause of that, Fontraption was handed data from the wrong plane 0 instead of from plane 2 when attempting to read font data from plane 2. Now it looks like this:

ftrap-main

Another nice example is the Norton Commander installer (this is text mode):

nc5-setup

Notice the mouse cursor, it's not my browser's but a graphical mouse cursor emulated by the installer in text mode (which needs read and write access to font plane 2)! This mouse cursor moves pixel by pixel, even though in text-mode. This mouse cursor was completely broken before. Another detail is the close symbol at the top left of the window, it's not a standard character.

For comparison, here's what it looks like in V86's classical (HTML DIV-based) text mode (the 4 characters in the lower left corner are the mouse cursor, the installer draws the pixel-exact cursor in these 4 chars):

nc5-setup-div-text

Norton commander uses quite a few custom characters (window frames, radio buttons and checkboxes):

nc5-options

What now also works is monochrome text mode, here an example:

mono-fd-edit

chschnell commented 2 weeks ago

Here's an example of 512-character mode (2 active fonts):

ftrap-512-chars

Press Ctrl+G in Fontraption and load the 8x8 font from VGA ROM if you want to see for yourself. Here it's even more obvious (Ctrl+I to import, then select file IMPORT/eddie1.bmp):

ftrap-eddie1

I was still looking for an example to test this mode and just realized that Fontraption requires it in order to work, so that's also tested now (I also checked the corresponding VGA register to make sure).

Fontraption is the perfect tool to test all this. For example, if you press F8 in Fontraption it toggles between font widths 8 and 9px which also causes the screen resolution to toggle along between 640 and 720px.