9999years / Unicode-PETSCII

5 stars 0 forks source link

Now Unicode does support PETSCII characters #2

Open jumpjack opened 3 weeks ago

jumpjack commented 3 weeks ago

Starting with version 13.0 (year 2020), Unicode started supporting new characters in set "Symbols for Legacy Computing", now mature in Unicode 16.0 https://www.unicode.org/charts/PDF/Unicode-16.0/U160-1FB00.pdf

Some codes from 0x1fb00 on, added to some existing codes at lower positions, complete the compatibility with PETSCII.

But:

9999years commented 3 weeks ago

If you look at the relevant proposals, you'll see my name on them! This repository was subsumed by the efforts of Rebecca Bettencourt :)

9999years commented 3 weeks ago

Oh, you had questions I didn't answer. Here you go:

I have yet to find a full and up-to-date PETSCII/UNICODE mapping

The mappings are attached to the proposal PDFs directly. If you click here, you can hopefully extract or view the relevant files: https://www.unicode.org/L2/L2017/17435r-terminals-prop.pdf

Your PDF viewer may make this non-obvious, so you might have to dig in some menus. This confused the Unicode Consortium members as well, who were reviewing the proposal as a printed hardcopy and thought no mappings had been submitted...

From the (also attached) ReadMe.txt:

Commodore PET (PETSCII)

CPETVPRI.TXT: Commodore PET primary character set as mapped in memory. CPETVALT.TXT: Commodore PET alternate character set as mapped in memory. CPETIPRI.TXT: Commodore PET primary character set as mapped by CHR$(). CPETIALT.TXT: Commodore PET alternate character set as mapped by CHR$().

The PET has REVERSE SOLIDUS where the VIC-20, C64, and C128 have POUND SIGN. The PET and VIC-20 have REVERSE FOUR-BY-FOUR CHECKER BOARD where the C64 and C128 have FOUR-BY-FOUR CHECKER BOARD, and vice-versa.

The primary character set has uppercase letters where the alternate character set has lowercase letters. The primary character set has semigraphics characters where the alternate character set has uppercase letters.

The CHR$() function mapping (or "interchange" mapping) maps to the in-memory mapping (or "video" mapping) as follows:

Interchange => Video $00 - $1F => (control characters) $20 - $3F => $20 - $3F $40 - $5F => $00 - $1F $60 - $7F => $40 - $5F $80 - $9F => (control characters) $A0 - $BF => $60 - $7F $C0 - $DF => $40 - $5F $E0 - $FF => $60 - $7F

Okay, second question.

My browsers in 2024 can't properly display most of 0x1fxx codes

This is what we call a "vendor implementation" issue, unfortunately. Unicode doesn't supply fonts, so each vendor has to draw up fonts for each new glyph manually, and there's not a lot of demand for SYMBOLS FOR LEGACY COMPUTING, to put it lightly.

My beloved PragmataPro by @fabrizioschiavi has supported these characters pretty much since day 1 though. Go check it out!

https://www.fileformat.info/info/unicode/block/symbols_for_legacy_computing/fontsupport.htm

jumpjack commented 2 weeks ago

Oh, you had questions I didn't answer. Here you go:

I have yet to find a full and up-to-date PETSCII/UNICODE mapping

The mappings are attached to the proposal PDFs directly. If you click here, you can hopefully extract or view the relevant files: https://www.unicode.org/L2/L2017/17435r-terminals-prop.pdf

Your PDF viewer may make this non-obvious, so you might have to dig in some menus. This confused the Unicode Consortium members as well, who were reviewing the proposal as a printed hardcopy and thought no mappings had been submitted...

From the (also attached) ReadMe.txt:

Commodore PET (PETSCII) CPETVPRI.TXT: Commodore PET primary character set as mapped in memory. CPETVALT.TXT: Commodore PET alternate character set as mapped in memory. CPETIPRI.TXT: Commodore PET primary character set as mapped by CHR$(). CPETIALT.TXT: Commodore PET alternate character set as mapped by CHR$().

Thanks, very useful files. I didn't notice there were files atTached to the PDF.

Okay, second question.

My browsers in 2024 can't properly display most of 0x1fxx codes

This is what we call a "vendor implementation" issue, unfortunately. Unicode doesn't supply fonts,

Ok I actually read this fact before but I do not understand; the PDFs contain graphical representations of the character, but do not contain binary files describing the single pixels of the characters, so anybody has to draw them by himself?

9999years commented 2 weeks ago

the PDFs contain graphical representations of the character, but do not contain binary files describing the single pixels of the characters, so anybody has to draw them by himself?

Yes, I've always thought this was a big failing of the way the Unicode standard is distributed. Proposers need to provide reference drawings of proposed glyphs, but those drawings aren't redistributed in a usable manner. Frustrating!

I think GNU Unifont has bitmaps for these codepoints: https://unifoundry.com/unifont/

Rebecca Bettencourt contributed:

  • U+1CC00..U+1CCF9 Symbols for Legacy Computing Supplement*
  • Supplemental Arrows-C glyphs U+1F8B2..U+18BB, U+1F8C0, and U+1F8C1
  • Symbols for Legacy Computing glyphs U+1FBCB..U+1FBEF.

See them at the bottom of the Plane 1 screenshot: https://unifoundry.com/pub/unifont/unifont-16.0.01/unifont_plane1-16.0.01.bmp

They're not bitmaps, but check out: https://github.com/dokutan/legacy_computing-font Also: Kreative Square by the lovely Rebecca Bettencourt (who wrote the proposal): https://www.kreativekorp.com/software/fonts/ksquare/

jumpjack commented 2 weeks ago

Thanks, these are amazing resources! I already knew https://github.com/dokutan/legacy_computing-font , which I forked to build my own "binary matrix extractor" for a generic characters grid:

https://github.com/jumpjack/legacy_computing-font-javascript/blob/master/README.md