boxgaming / qbjs

QBasic for the web
https://qbjs.org
MIT License
51 stars 9 forks source link

Support for international characters and UTF8 #114

Open mobluse opened 1 month ago

mobluse commented 1 month ago

I cannot get Swedish characters, åäöÅÄÖ, to work even though that worked in QBasic for DOS. As soon as I type them they are converted to Greek letters.

boxgaming commented 1 month ago

Thank you for reporting. We’ll look at trying to address this in a future release. In the meantime, if you are wanting to print UTF-8 characters to the screen it will display correctly if you use a font other than the default DOS font:

Dim fnt As Long
fnt = LoadFont("Arial", 18)
Font fnt
Print "Hej världen!"
mobluse commented 1 month ago

I could get your example to work but "ä" (ä in HTML) looks like greek capital letter sigma in the editor because that is still the default font. Courier works as a fixed width font, but it doesn't exactly fit with my graphics using Line and Locate no matter if I use 13 or 14. I believe the problem might be Locate. Are there more fixed width fonts that work?

Screen 12
Dim fnt As Long
fnt = LoadFont("Courier", 13)
Font fnt
boxgaming commented 1 month ago

There are a lot of options for loading fonts. You can reference any standard web font, load a font from an online source (e.g. Google fonts), or load a font from a font file in the virtual file system. Here’s a post on one of the QB64 forums with more details: https://qb64.boards.net/post/700.

Also, you might try changing the UI theme in settings. The VSCode Dark theme and windows classic theme both use fonts that should display the characters correctly in the code editor.

mobluse commented 1 month ago

I did find in the forum post https://qb64.boards.net/post/700 under "3. From a URL" how to load a font.

I found a ttf font that looks like the default font and supports CP437, but uses UTF-8 code points instead, and I load that font, but it has the wrong line spacing (the rows are more dense) and Locate doesn't position it at the same place compared to graphics using Line. It has the right width when size is 16.

Import Dom From "lib/web/dom.bas"

Dim fnt
fnt = LoadFont("https://webdraft.hu/fonts/classic-console/fonts/clacon2.woff2", 16)
Font fnt

The international characters work in the editor with e.g. VSCode Dark theme.

Example program: nonie.bas (Just uncomment the line Font fnt to try the new font. Just press Enter several times to see the graphics and text together. I still have aa=å, ae=ä, and oe=ö, but I will change to Swedish characters when they work.)

boxgaming commented 1 month ago

After researching further, I found that the character issue was introduced when we converted the original .ttf formatted font file to .woff2 format. The missing characters have been updated and editor updates have been made to allow those characters to be displayed. This fix will be included in the next release. In the meantime, if you would like to preview the changes you can check them out in the development build using the github pages url: https://boxgaming.github.io/qbjs.

As a test case you can try out an updated version of the example program that you shared there: nonie-updated.bas

Additionally, in order to support backwards compatibility with the Code Page 437 character codes used in QBasic programs and QB64, you will notice that these characters have a different character code than their modern unicode equivalents.

Here is a handy little program to print a character code reference in QBJS:

Screen NewImage(520, 520, 32)

Const COLS = 16
Dim As Integer i, r, c
For r = 0 To 15
    Color 15
    For i = 1 To COLS
        c = i + COLS*r
        If c = 10 Or c = 13 Or c > 255 Then
            Print "    ";
        Else
            Print "  "; Chr$(c); " ";
        End If
    Next i
    Print

    Color 8
    For i = 1 To COLS
        Print Right$("  " + Str$(i + COLS*r), 3); " ";
    Next i
    If r < 15 Then Print
Next r

(If you run this in the current production version you will see the missing character codes.)

Also, here is an example of how you can use the Console library to print out a character by its character code to the QBJS console so that you can copy and paste it into your source without having to use the Chr$ method:

Import Console From "lib/web/console.bas"

Console.Echo Chr$(142) + Chr$(143) + Chr$(132) + Chr$(134) + Chr$(153) + Chr$(148)
boxgaming commented 1 month ago

Update: The fix described above works in Chromium-based browsers, however Firefox and Safari do not display the characters in that range. This is being researched further.

mobluse commented 1 month ago

Great! But one can still not type in ÅÄÖååö in QBJS. I use English UK/GB keyboard layout in Linux and then you type Alt+Shift+[ followed by Shift+A for Å, Alt+[ followed by Shift+A for Ä, Alt+[ followed by Shift+O for Ö, Alt+Shift+[ followed by A for å, Alt+[ followed by A for ä, Alt+[ followed by O for ö.

boxgaming commented 1 month ago

I decided to take a different approach altogether. I’ve updated the default font to use standard UTF character mapping. For backwards compatibility I also added a 437-to-UTF character crosswalk so that Asc and Chr$ methods will still return expected values. There is also a utility now in the settings dialog that can convert the source from 437 code page to UTF in case code was loaded or pasted from an original QBasic or QB64 source. I think this should allow for more consistent behavior now regardless of the font used.

The changes can be previewed in the development build: https://boxgaming.github.io/qbjs.

mobluse commented 1 month ago

Yes, I think this way is better. Typing Print "ÅÄÖÜÉåäöüé" works now, but ü looks like it is in another font. I tried the utility by loading a file in CP437, but when I clicked the button Code Page 437 to UTF nothing happened. I then converted the file using iconv -f 437 -t UTF8 Nonieega.bas -o Nonieega-utf8.bas and then loaded Nonieega-utf8.bas and that looked good.

mobluse commented 1 month ago

ü has wrong font. There are some errors in the last 32 characters of the 255 first. This was discovered using this program:

cls
for i=0 to 255
if i<>0 and i<>10 then print chr$(i); else print " ";
if i mod 32=31 then print
next

See: 2024-09-26-025208_814x332_scrot and compare to https://en.wikipedia.org/wiki/Code_page_437: 250px-Codepage-437 The 3rd character in the last line should be Gamma, but is pi. The 4th character should be pi, but is something else. The 7th from the end should be a bullet, but is a centered dot. The 6th character from the end is correctly a centered dot.

boxgaming commented 1 month ago

Good catch. I have fixed the character mapping issues. The changes can be previewed in the development build: https://boxgaming.github.io/qbjs. Let me know if you see any other issues.