Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
375 stars 19 forks source link

Make back arrow character "←" different from underscore character "_" #1854

Open MattHeffron opened 4 days ago

MattHeffron commented 4 days ago

The representation of character code 0x5F as "←" instead of "_" dates back to the 1963 ASCII standard, and Interlisp/Medley preserved that interpretation for backward compatibility with earlier systems which had Lisp implementations (e.g. DEC-10). (This seems not to match the XCCS encoding; as that, according to the Wikipedia XCCS article, has character code 0x005F as "_", with "←" at character code 0x00AC. Standard Medley fonts seem to have an encoding for character set 0 that differs from XCCS.)

Medley is a bit schizophrenic in how it treats character code 0x5F. In general, it is a character that can be used to construct LITATOMs. However, in CLISP constructs in the Interlisp world it may also be interpreted as an operator character. Simply changing the glyph for that character code to "_" would render the older CLISP code a bit less readable. (This might be acceptable!) Leaving it as "←" makes Common Lisp code look odd, and it can frustrate new users as that glyph isn't on modern keyboards. (This would keep all Interlisp/Medley documentation and publications correct.)

The modernization of Medley for modern keyboards and support for Common Lisp that use the Unicode glyph encoding suggests splitting these into two independent first class characters. There seem to be a few strategies for this:

  1. Changeover Medley to be fully Unicode based.
  2. Change character code 0x005F to "_" (matches Unicode, XCCS, current ASCII), set "←" at character code 0x00AC (XCCS, not Unicode, but it leaves "←" in character set 0).
  3. Change character code 0x005F to "_" (matches Unicode, XCCS, current ASCII), set "←" at character code 0x2190 (Unicode, not XCCS, and it moves "←" to character set 0x21 = 041).

The above 3 strategies (and any others that I didn't think of) would require:

#1 clearly would be a huge effort, but also would be most desirable (for the full Unicode support). #2 is comparatively simplest of the three, (but only because it leaves "←" in character set 0). As mentioned above, simply changing the glyph for that character code to "_" might be acceptable, and would be the simplest. All that would be required would be updating of fonts and PostScript/PDF printing. (Interpress and Press could be modified for completeness, but seem to be less useful. IMHO.)

nbriggs commented 4 days ago

As far as I know, the XCCS (NS) fonts all have the "_" glyph where the Alto/Press fonts have "←", so it depends on which font set you're using: Screenshot 2024-10-14 at 2 09 14 PM

masinter commented 4 days ago

I like the idea of changing what we mean by "XCCS" in an external format, to define the code rewrites so that "_" is left arrow and "^" is up-arrow. This is nominally an incompatible change but I think it would be better. We'd have to change the NS fonts to swap the glyphs.

masinter commented 4 days ago

In particular

Adding support for a "←" character in keyboard mapping that is different from "_".

Not needed. There already is support. The keyboard when you type a "_" gives you the old-tty-character which prints as a left arrow in Medley. There's another XNS character for underscore and circumflex that don't ordinarily have keyboard assignments.

This should follow, or be part of, the work on https://github.com/Interlisp/medley/issues/58 File conversion utilities that somehow leave an indication that the file has been "converted".

Not really needed. Pretty much all medley sources can be used without conversion.

These utilities likely would need to be interactive, as many cases would be ambiguous as to the intended character. Tedit and Sketch (and other) files likely would use different heuristics for conversion, vs. for code files.

not needed

Revising all CLISP/DWIM code that interprets character code 0x5F as "←", to instead use the new character code for "←" functionality. (This may be a bit of chicken-and-egg issue with implementing the conversion utilities.)

not needed though not a bad idea

Modifying all character set 0 font files to add the "_" glyph and update the WIDTHS, OFFSETS, and IMAGEWIDTHS information.

just modify the NS fonts, leave alto fonts like GACHA and HELVETICA etc alone

If there are fonts that have "_" but not "←" then the corresponding modification would be required.

not sure there are any

Revise PostScript/PDF printing to handle the changed character codes appropriately.

This is a matter of undoing the patch to substitute

MattHeffron commented 1 day ago

To clarify: my preference would be that character code 0x005f be the "_" glyph, and some other character code be the "←" glyph. This is what new users would expect going forward; especially Common Lisp users. This is why file conversion utilities, and changing the CLISP/DWIM code seem to be necessary. This is also why

Adding support for a "←" character in keyboard mapping that is different from "_".

would be necessary; so that there is still some way to type the "←" character. (I suspect that keyboards with a "←" key are rare, or non-existent.) Once the character code is chosen, then it may be simply putting that into the KEYACTION table(s) on some key. But that may need the keyboard consistency changes to enable recognition of (at least one of the) "common" keys that currently are ignored (e.g., F10).

Re: changes to fonts, I looked only at HELVETICA's font character bitmap. So from @nbriggs comment:

As far as I know, the XCCS (NS) fonts all have the "_" glyph where the Alto/Press fonts have "←"

it appears that NS fonts already show 0x005f as the "_" glyph. (Do they have the "←" glyph?) I didn't remember that there were ≥ two different font sets. In my head everything was XCCS, as that's the internal character encoding.

larsbrinkhoff commented 1 day ago

I just wanted to mention in passing that MIT's ITS and Stanford's WAITS operating systems have the same ambivalence about the "" or "←" character (and also "^"/"↑"). They both got started on PDP-6 computers (the immediate precursor to the PDP-10 and DEC-10, DEC-20) in the mid-60s before ASCII finalized on and ^.