RasppleII / a2server

AppleTalk server for Apple II computers
Other
31 stars 8 forks source link

Info: character sets #39

Closed IvanExpert closed 8 years ago

IvanExpert commented 8 years ago

UTF-16 is 2-4 byte (not relevant, but just saying) UTF-8 is one byte 0-127, ASCII compatible; 2-6 bytes for everything else this screws up Apple II term programs for non-ASCII chars (e.g. hyphen, smart quote)

ISO-8859-* is one byte 0-255, with 128-255 variying by "part" 1-16 ISO-8859-1 is "Latin-1", revision is ISO-8859-15, others are langauge-specific Apple II text comm programs are going to display 0-127 anyway, since Apple II 128-255 are redundant or MouseText "ANSI" in a comm program means pseudo VT-100, and may also mean the "DOS CodePage 437" (IBM PC character set), as is the case with Spectrum ANSI emulation So it doesn't matter which ISO-8859 part, since the comm programs aren't going to use any of them. The main thing is that it's one byte per character, unlike UTF-8 TERM=vt100 on Pi makes Linux programs mostly display B&W, and makes ctrl-chars display on Spectrum ANSI TERM=pcansi on Pi makes Linux programs do color for Spectrum ANSI (TERM=ansi just breaks everything) LANG=en_US (as opposed to en_US.UTF-8) gets you ISO-8859-1, which is better for Spectrum ANSI, but the en_US ISO-8859-1 locale has to be available (from raspi-config) See A2CLOUD setup for how to generate locales from Linux prompt ProTERM VT-100 just repeats 128-255; ANSI BBS uses ASCII and mousetext to approximate DOS Code Page 437 Spectrum VT-100 is sort of arbitrary in 128-255 TERM=VT100 doesn't work with "ANSI" emulation because it outputs ctrl-O around text styling which is a displayed character in CP437

single-byte: ASCII is single byte 0-127 (0-31 are "C0" control codes, plus 127 is DEL) ISO-8859-* (1-16) is ASCII for 0-127, 128-159 are "C1" control codes, 160-255 are regional characters ISO-8859-1 is standard "Latin-1", ISO-8859-15 is updated for Euro and other chars

Microsoft has its own "codepage" numbers for character sets. Codepage 437 (aka "ANSI BBS") is the DOS character set: ASCII from 32-126, plus printable chars at 1-31 and 127-255; (all chars are also represented in UTF-8) "Linedraw" font for Windows provides characters 128+ for codepage 437: ftp://ftp.microsoft.com/Softlib/MSLFILES/GC0651.EXE (use 64.4.17.176 if doesn't resolve) Also "Terminal" font in XP provides most of it; Courier New is a Unicode font with most of the same characters Windows-1252 (codepage 1252) is ISO-8859-1 with additional chars from 128-159 instead of C1, including all chars in ISO-8859-15 Mac has "macintosh" or "MacRoman" encoding which is ASCII for 0-127 and its own characters for 128-255

UTF-8 characters 0-127 is same as ASCII UTF-8 characters 128+ are between two and four bytes and can represent everything (I guess) UTF-16 characters are between two and four bytes, and are endian-sensitive UTF-32 characters are always four bytes, and are endian-sensitive

knghtbrd commented 8 years ago

The real solution for terminals would be to define the appropriate terminal definitions for ProTERM, Spectrum, etc. Character sets and locales are more of an issue since these tend to be offered as iso8859-* or utf-8 or sometimes multibyte character sets that don't relate to the Apple // at all. We should be able to get cp437 working for Spectrum. It's possible that we could also get MouseText working for limited boxdraw support in things like dialog.

Not sure how to tag this one. It's a bug certainly, but a bug in what exactly, aside from A2CLOUD in general? I'll move this there, but the fix is going to be complicated.

knghtbrd commented 8 years ago

This issue was moved to RasppleII/a2cloud#5