IBM docs present 3270 terminal screens in IBM Plex Mono, but IBM Plex Mono lacks some APL characters used in 3270 screens

GrahamHannington commented 2 years ago

Hi @BoldMonday,

In issue #93, you commented:

So far we had no conversations about support for 3270 terminals.

I gather this means that, when IBM briefed you about requirements and use cases for IBM Plex Mono, they didn’t even mention 3270 terminals.

And yet, IBM are using IBM Plex Mono to present 3270 terminal screen captures in IBM product docs! Apparently, without considering whether the font is fit for purpose; that is, whether it supports all of the required characters.

It doesn't. So, for some 3270 screens, documentation writers must resort to using bitmapped screen captures.

You've recently added box-drawing characters (issue #93). That's great, thanks!

However, as I commented in issue #93, box-drawing characters are only a subset of the APL characters that 3270 terminal screens can display. See EBCDIC code page 310 in the Wikipedia article "Digital encoding of APL symbols".

For more details, I recommend that you contact IBM. They are the 3270 experts, and they made the decision to use IBM Plex Mono to present 3270 screen captures in their product docs.

BoldMonday commented 2 years ago

@GrahamHannington thank you for the continued feedback on this issue.

The addition of APL symbols is on the list of future updates but there are still other priorities at the moment. As external contractors we are dependent on the budgets that are allocated to us by IBM. I’m sure you understand.

GrahamHannington commented 2 years ago

we are dependent on the budgets that are allocated to us by IBM

Sure, understood. I like what you've done with that budget; I like IBM Plex!

I feel bad using Noto Sans Mono for some 3270 screens when I'd prefer to be using IBM Plex Mono.

Re:

The addition of APL symbols

Just so that we're on the same, er, code page :wink:: the term "APL symbols" might mean different things to different people.

Here, I'm specifically requesting the characters in EBCDIC code page 310 (see the link in my original post).

In Unicode terms, some of the characters in that code page are characterized as APL functional symbols, some are not; some are in the "Miscellaneous Technical" Unicode block, some are not.

(I've just noticed issue #176, opened in 2018.)

GrahamHannington commented 2 years ago

To highlight the missing characters, I copied the HTML for that table of EBCDIC code page 310 from Wikipedia, and tweaked the CSS for the sample characters to font-family: "IBM Plex Mono", "Adobe NotDef", so that any characters not present in IBM Plex Mono would fall back to the Adobe NotDef glyph.

ebcdic-code-page-310-ibm-plex-mono

That's a whole lotta tofu. :wink:

ebcdic-code-page-310.html.zip

I attached a .zip of the HTML because the tooltips show the Unicode character names and code points.

GrahamHannington commented 2 years ago

Mousing over anonymous tofu gets tired real quick, so I tweaked the CSS some more, including this to expose the tooltips:

td::after {
  font-size: x-small;
  content: attr(title) " ";
}

ebcdic-code-page-310-ibm-plex-mono-unicode-characters.pdf

ebcdic-code-page-310-with-unicode-char-names.html.zip

GrahamHannington commented 2 years ago

EBCDIC code page 310 coverage: IBM Plex Mono versus Noto Sans Mono

2-page PDF, attached.

Hi @BoldMonday,

I'm not trying to rub it in (the difference in coverage); I actually thought you might find this useful.

Flipping between the two pages highlights the difference between the tofu in IBM Plex Mono versus the glyphs in Noto Sans Mono.

ebcdic-code-page-310-ibm-plex-mono-vs-noto-sans-mono.pdf

GrahamHannington commented 2 years ago

I've given you bad information

Hi @BoldMonday ,

I'm sorry.

I've previously referred you to a table in the Wikipedia article "Digital encoding of APL symbols".

That table maps 3270 characters (specifically, characters in EBCDIC code page 310) to Unicode characters.

:warning: I've just discovered that some of those mappings are incorrect. At the very least, incorrect in the context of IBM 3270 terminal displays.

Unfortunately, I don't have a direct, correct replacement table to offer you: that is, a table that shows the correct glyphs and corresponding Unicode code points.

Frankly, I'm still digesting this news myself.

Earlier this week, I saw a 3270 screen that contains the character with EBCDIC code page 310 byte value X'81', which IBM characterizes as "Double Vertical, Bar Graphic", GCGID SF630000.

That on-screen glyph is significantly different to the glyph shown in the table in Wikipedia.

The lines in the on-screen glyph are as far apart as possible, whereas the lines in the glyph in the table in Wikipedia are closely spaced.

This prompted me to add a section to the "Talk" page of that Wikipedia article, "Incorrect mapping of EBCDIC code page 310 (APL) to Unicode characters?"

Another Wikipedia user replied:

IBM actually maps SF630000 (the 0x81, double vertical one) to U+F892 in their corporate Private Use Area scheme, and SF620000 (the 0x82, double horizontal one) to U+F893, also in the Private Use Area (as seen in unicode.nam, included here). In terms of more recent additions to Unicode that the cited sources did not have the benefit of, ... (U+1FB80) in the Symbols for Legacy Computing block is a much closer match to the double horizontal one, but there is still no particularly good match to the double vertical one

That prompted me to do more research.

Better information

While I don't have a direct replacement for that table in Wikipedia, I can offer you:

A table that maps EBCDIC code page 310 byte values to glyphs that more accurately represent the 3270 characters, and the corresponding GCGIDs
A table that maps Unicode PUA code points to GCGIDs

For example, you can see from these tables that EBCDIC code page 310 byte value X'81' ("Double Vertical, Bar Graphic", GCGID SF630000) maps not to U+2551, but to the PUA code point U+F892, and that the lines in the glyph are spaced as far apart as possible, which is significantly different to U+2551.

Which 3270 characters are only in the IBM PUA?

I don't know.

Certainly, these two EBCDIC code page 310 byte values:

X'81' maps to the PUA code point U+F892, as already discussed
X'82' (GCGID SF620000, "Double Horizontal, Bar Graphic") maps to the PUA code point U+F893

Do other characters in the character set (GCSGID 00963) for EBCDIC code page 310 also map to IBM PUA characters? I don't know. Given the available information, I think it's possible to answer this question, but I acknowledge that I'm currently not thinking clearly enough to work out an efficient method to do that. I need more coffee, or more sleep. 🙂

What does this mean for you, and for IBM Plex Mono?

To properly support 3270 screens, IBM Plex Mono will need to include characters in the IBM PUA.

Why bother? Why not just map to standard characters?

Mapping to "standard" vs "PUA" characters can significantly affect the appearance, even usability, of a 3270 screen.

Example: EBCDIC code page 310 byte value X'81':

The glyph for the standard Unicode character BOX DRAWINGS DOUBLE VERTICAL (U+2551) has white space on either side of its lines
The glyph for the IBM PUA code point U+F892 does not: its lines abut adjacent characters

When used as a table column separator, to distinguish, say, non-scrollable columns from scrollable columns, U+2551 gives characters in adjoining table cells some breathing space; U+F892 does not. Arguably, then, U+2551 is usable in this context, but not U+F892.

GrahamHannington commented 2 years ago

IBM PUA characters in EBCDIC code page 310

This is my current best effort at identifying the IBM PUA characters in EBCDIC code page 310:

EBCDIC code page 310 byte value (hex)	GCGID	GCGID name	IBM PUA Unicode code point (U+)
55	LN480000	N Line Below Capital/N Underscore (APL)	F8D7
56	LO480000	O Line Below Capital/O Underscore (APL)	F8D5
57	LP480000	P Line Below Capital/P Underscore (APL)	F8D3
58	LQ480000	Q Line Below Capital/Q Underscore (APL)	F8D1
59	LR480000	R Line Below Capital/R Underscore (APL)	F8CF
62	LS480000	S Line Below Capital/S Underscore (APL)	F8CD
63	LT480000	T Line Below Capital/T Underscore (APL)	F8CB
64	LU480000	U Line Below Capital/U Underscore (APL)	F8C9
65	LV480000	V Line Below Capital/V Underscore (APL)	F8C7
66	LW480000	W Line Below Capital/W Underscore (APL)	F8C5
67	LX480000	X Line Below Capital/X Underscore (APL)	F8C3
68	LY480000	Y Line Below Capital/Y Underscore (APL)	F8C1
69	LZ480000	Z Line Below Capital/Z Underscore (APL)	F8BF
80	SL460000	Tilde (APL)	F88F
81	SF630000	Double Vertical, Bar Graphic	F892
82	SF620000	Double Horizontal, Bar Graphic	F893
85	SF660000	Center Vertical, Bar Graphic	F891
8A	SL610000	Up Arrow (APL)	F88B
8B	SL620000	Down Arrow (APL)	F88A
8F	SL600000	Right Arrow (APL)	F88C
9D	SL080000	Circle (APL)	F890
9F	SL590000	Left Arrow (APL)	F88D
A4	LN012000	n Small Subscript	F8D8
B7	SL640000	Slope (APL)	F889
DB	SL580000	Quote Dot (APL)	F88E

To create this table, I used Excel to correlate unicode.nam with CP00310.txt (ftp://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00310.txt).

I've yet to see any "Line Below Capital" characters on a 3270 screen. Then again, I've never programmed in APL.

I'm curious to know how we got here: Unicode contains characters for Ancient Greek Musical Notation, but not comprehensive support for all 3270 characters. I can imagine some reasons, but I'd be interested to know the real story.

GrahamHannington commented 2 years ago

Example 3270 screen containing all IBM PUA characters

I wrote:

I've yet to see any "Line Below Capital" characters on a 3270 screen.

This irked me.

Today, based on code provided to me by a vastly more experienced colleague, I wrote a z/OS REXX exec that dynamically generates a 3270 screen (specifically, an ISPF panel) that shows all of the PUA characters listed in my previous comment.

Here's an image of the screen:

ibm-pua-characters-3270-screen

The glyphs in white, in the first column, are provided by a proprietary (non-Unicode) font that is supplied with the terminal emulator. Everything else is set in IBM Plex Mono. The proprietary font doesn't necessary have the same font metrics (e.g. glyph widths) as IBM Plex Mono, but the method that the emulator uses to position the characters means that this doesn't matter; it doesn't affect the alignment of the screen contents.

I think that some, perhaps even most, of these characters could be mapped to existing equivalent characters in the Unicode standard. I'd like to know why IBM chose to map such characters to PUA code points instead of existing standard code points. Perhaps the answer is in the qualifier "existing"; perhaps IBM made that decision before such characters were in the standard. I'm just guessing. I'd really like to understand the history here. If you have that conversation with IBM, I'd be grateful if you share what you can.

I'm unaware of any Unicode font that includes all of these characters (at these PUA code points).

GrahamHannington commented 2 years ago

I wonder about:

Adding IBM PUA characters to an open-source font.

Then again: if not in IBM Plex, then where? Why use the "IBM" qualifier in the name if you're not going to include IBM PUA characters?
Proposing to Google Fonts that they include these IBM PUA characters in a variant of the Noto Sans Mono font; perhaps, "Noto Sans Mono IBM" (or "...3270", to avoid trademark issues), to use as a fall-back font for presenting 3270 screens, when Noto Sans Mono lacks the IBM PUA characters. However, from the Google Fonts "Contribute to Noto fonts" topic:

If you're proposing design for new codepoints, those need to already exist in the Unicode Standard. Google Fonts does not accept proposals for scripts that are not part of Unicode.

I'm not really proposing a "script" as such; although, yeah, these code points definitely aren't in the standard, in the sense that they're in the PUA.
Just how many of these PUA characters truly warrant a PUA code point?

That table I cited previously in Wikipedia does a pretty good job of mapping to standard Unicode characters. Just X'81' (U+F892) and X'82' (U+F893)? I'll admit, I haven't (yet!) diligently explored the Unicode standard for matching characters for all of these.
Can I synthesize U+F892 and U+F893 without having those specific glyphs available in a font; say, via CSS border properties, or by superimposing existing characters?

(Just because I have an idea, doesn't mean I like it. 😉)

IBM / plex