commanderx16 / x16-rom

Other
153 stars 43 forks source link

NuPET ASCII #28

Open mist64 opened 4 years ago

mist64 commented 4 years ago

[Tom Wilson] https://www.facebook.com/groups/CommanderX16/permalink/519788175438948/?comment_id=521425405275225

I have a suggestion.... and this will seem radical at first.

Change the PETSCII character order so that it's the same as ISO/ASCII mode. Get rid of upper case/graphics mode, and use the upper 128 character slots in PETSCII to hold the graphics characters.

After removing the redundant characters, there's a ton of unused space in the PETSCII character set, and with per-character background colors, we don't need most of the reversed characters. The only ones that should be retained are the line drawing characters.

This character set unifies ASCII and PETSCII, includes all of the PETSCII graphics, adds some missing characters (right and down arrow, the rest of the half-shaded box graphics, and some other tidbits) and SIMPLIFIES the entire system by unifying the PETSCII, screen code, and ASCII character order.

Doing this means ISO mode is also easier to implement, because there's no need for separate text handling for ISO and PETSCII mode. The only thing you'd need to handle separately are the characters with diacritical marks.

Since our keyboards have Alt and logo keys, we could still use keyboards to enter PETSCII symbols. The left glyph with the logo key and the right glyph with the ALT key. The extra glyphs not part of the current PETSCII system would be entered with alt+logo, or maybe an OSD.

https://docs.google.com/document/d/1iM6CZ1iWbCIN6-MT4zBlplvSJhmm_rKxE6b-7XCvzds/edit?fbclid=IwAR1uEEWdehmOYNAyTJ3umKudThMl3_QuA8gkne_R9917NI9h95u-3L81sAM#heading=h.d3ddoz790a0h

Kroc commented 4 years ago

NB: PETSCII symbols have been added to Unicode, meaning that a program that could dump C64 BASIC programs could write these Unicode PETSCII code-points, and the X16 emulator can translate those to its own equivalent PETSCII screen-codes; ergo, re-arranging the PETSCII symbols on the X16 need not actually break compatibility in practice.

To add to the keyboard use, two possibilities spring to mind:

mobluse commented 4 years ago

The right Alt key (AKA AltGr) is needed for some characters on non US-keyboards, e.g. AltGr+4 types € on a UK keyboard, and that is part of ISO8859-15. Sometimes AltGr+Shift+key is also used.

Kroc commented 4 years ago

Since the X16 can't use both PETSCII & the ISO8859 font at the same time, the use of AltGr for PETSCII typing may not be an issue

mobluse commented 4 years ago

On e.g. German and Nordic keyboards you press AltGr+8 and AltGr+9 to type [ and ], and these are part of PETSCII.

Ferk commented 4 years ago

Is the Commander X16 gonna support multiple keyboard layouts? (in the default ROM) If not, then perhaps it wouldn't be an issue. If it does, wouldn't the PETSCII characters need to be rearranged in the new layout as well anyway?

mist64 commented 4 years ago

Is the Commander X16 gonna support multiple keyboard layouts? (in the default ROM)

It already does.

0cjs commented 4 years ago

NOTE: Perhaps someone can pass a link to this on to the original author, Tom Wilson (I think), since the discussion about this appears to be going in a group that you need a Facebook account to access.

I think that defining a new encoding (or "code page" in IBM PC terminology) that encodes all PETSCII characters but is more ASCII-compatible is a great idea. And adding additional graphics characters is good, too.

That said, I have some thoughts on your NuPET ASCII proposal.

Dropping the Second Pair of Control Code Sticks

PETSCII allocated $80-$9F to a second group of control codes, covering the same two sticks (16-character "rows") as the first group except with the high bit set. This was used from the start for the "shifted" version certain control codes, such as HOME/CLR ($13/$93) and CURSOR DOWN/UP ($11/$91). (This was related to the original PET keyboard design—the SHIFT key simply added 128 to the code for the key it was paired with.)

There's plenty of space to keep this if only the original PETSCII graphics characters are encoded in the character set, and this would allow us both to take the PET approach of being able to encode all PETSCII screen control codes as single control characters and maintain better compatibility with PET and C64 BASIC programs. (Your current design loses this; for example the HOME control sequence can now only be sent using ANSI-like escape codes and what was the HOME control code now clears the screen.) Even for those codes that would need to be translated from PETSCII to NuPET, translation would be significantly easier if all PETSCII control codes mapped to single character control codes in NuPET.

This does not prevent adding further glyphs available only as screen codes, since these can be in the "control" areas of the charset ROM and poked in directly or via an escape code as you suggest with $13/^M.

It would also allow considerable simplification of the control codes, since (if we wanted to) we could reduce everything to single-character control codes as used on CBM systems plus a few much simpler non-ANSI-like escape codes.

(That said, I'm not against supporting ANSI, but it's a fair amount of work and I'm not sure how worthwhile it really is. I personally feel no need for it at all, but others might find it useful to be able to simply cat or type old BBS ANSI text files, or use programs that hardcoded ANSI sequences.)

ASCII-1963 vs. ASCII-1965

You should clearly document that, consistent with the change from ASCII-1963 to ASCII-1965, you're changing the glyphs for $5E and $5F from and to ^ and _, but the meanings of those codes stay the same. (At least, I assume that's your intent.) Thus, in BASIC you'd now type and see 2^3 for exponentiation instead of 2↑3.

Backspace vs. Rubout

^H (BS) in ASCII does not conventionally rubout the previous character. Most terminals don't do this, nor does the C64.

To demonstrate the conventional behaviour, run Bash in an xterm or rxvt and echo -e 'ab\bBcd\b\b. you'll see that the b is overwritten with B, showing that \b moved the cursor, but the last two \bs followed by a newline leave the cd on the screen; they were not rubbed out. (The tty driver typically processes erase (usually set to BS or DEL) by sending a \b \b sequence to rubout the previous char.)

Some old (non-CBM) BASIC programs would use this behaviour for a limited "glass TTY" screen control by printing CHR$(8); this would give you compatibility with that. I don't see anything you'd be incompatible with by making ^H not rubout. (On the PET and C64, print chr$(8) does nothing as far as I can tell. In PETSCII it seems to be marked "shift disable," but I don't where that ever had meaning.)

If you still feel you need a separate code for "move back one and rubout," you could still assign that to another character, such as ^A, but I'm not really sure that I see the need for it, since the fairly simple ^H-space-^H sequence already lets you do that quite easily.

Non-breaking Space

Not an unreasonable idea, but you've assigned this to code 256, which doesn't exist in an 8-bit 0-255 encoding. (Perhaps you meant $FF/255?) Regardless, according to Wikipedia this already seems to exist at position $A0 (space with the high bit set) in PETSCII; it would make more sense to leave it there, woudl it not? You've put a full block in that position, but that doesn't seem necessary since it's just a reversed space: $12 $20 $92 or {RVS ON} {RVS OFF}.

Summary

Aside from some ASCII-related documentation and behaviour suggestions, basically everything here I'm suggesting falls into the category of "be more like a PET/C64 and less like an ANSI terminal" (or IBM PC with ANSI.SYS loaded). While I don't mind adding escape sequences (which the PET/C64 never used, as far as I'm aware), I think it would be better to preserve as much as possible the simpler behaviour of the PET and C64 when it comes to screen handling, both for immediate compatibility (codes not needing to be translated at all) and for easier translation when codes do need to be translated, as is already the case with, e.g., lower-case letters. A key part of this would be to restore the upper two control code sticks ($80-$9F), the original screen control codes in the control areas (e.g., $11/$91/$1D/$9D for cursor down/up/right/left) and then consider devising simpler, more PET/C64-like escape sequences for control and glyphs that can't fit into the existing single-character framework.

While this doesn't prevent adding ANSI escape code support, it doesn't require it, as your proposal does, and perhaps that support might anyway be better added as an optional terminal driver, the way it was on the IBM PC with ANSI.SYS.

mobluse commented 4 years ago

One could compact chargen by removing all reversed characters since they can be generated from the non-reversed characters, and one could remove all repeated characters, since the repetitions follow certain patterns. One could recreate the entire chargen in VERA RAM using a small chargen, or one could recreate a smaller chargen if one has rules for reversing and taking care of the repetitions.

chargen is 4 KB and has room for 512 8x8 characters.

I have code to convert from PETSCII-UC & PETSCII-LC to UTF8 and that is similar to how expansion of a small chargen could be made. https://github.com/mobluse/x16-petscii2utf8/blob/master/petscii2utf8.c

I think one should not remove any PETSCII unique characters or control codes from X16 since it should be a C64 alike computer and it should be easy to port programs to it from C64 or PET. If one want to change the character set one should at least have Kernal routines to convert back and forth, and perhaps also extend the BASIC with ASCPET() and CHRPET$() for ASC() and CHR$().

I'm not certain that the current suggestion contains all PETSCII characters from both UC and LC mode.

I also think the current control codes should remain, but one could use Escape sequences, but then one should follow xterm and not ANSI, since Microsoft does that for the DOSBox in Windows 10: https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences

One could probably only have a subset and then one could have a subset that works with ncurses.

I would like to be able to connect a VT100 or xterm to Commander X16 (or the emulator), and also use the X16 as a terminal to other computers.

Also PETSCII and other retro computers' character sets will be part of Unicode soon: https://retro-hardware.com/2019/01/17/unicode-and-retro-computer/ https://www.unicode.org/L2/L2019/19025-terminals-prop.pdf

In the free ROM space that is created by compacting chargen one could implement characters from other retro computers, e.g. I would like the block sextant characters that are not already part of PETSCII or could be created by reversing (they where used by e.g. Videotex, Teletext, TRS80, and ABC80).

0cjs commented 4 years ago

@mobluse I agree with you on maintaining as much compatibility as reasonably possible. And removing the repeated characters sounds like a good idea to me, if a good compromise can be worked out with regard to compatibility. I believe it would be possible, given appropriate design, to handle porting with a fully automatic translator. Perhaps I am overly optimistic, but I think that should be at least be a design goal, even if not entirely achieved.

For working with curses or its variants you don't need VT-100/Xterm/ANSI control codes or anything like that at all; just make a termcap or terminfo definition with the CBM terminal control codes. The original curses was designed to be able to support the ADM-3A, and even the original PET control code set about as capable than that terminal. It's certainly worth considering adding some new codes to, e.g., clear to EOL and end of screen, and insert and delete lines, in order to make updates more efficient, though.

For use of a VT-100/Xterm/ANSI-style terminal with the X16, it might be better merely to put a conversion layer between the X16 and the terminal and let the programs continue to use CBM control codes. To work the other way around, the terminal program could understand ANSI codes and update the screen appropriately, just as terminal programs on the IBM PC usually did.