IBM / plex

The package of IBM’s typeface, IBM Plex.
SIL Open Font License 1.1
9.6k stars 567 forks source link

IBM docs present 3270 terminal screens in IBM Plex Mono, but IBM Plex Mono lacks some APL characters used in 3270 screens #433

Open GrahamHannington opened 2 years ago

GrahamHannington commented 2 years ago

Hi @BoldMonday,

In issue #93, you commented:

So far we had no conversations about support for 3270 terminals.

I gather this means that, when IBM briefed you about requirements and use cases for IBM Plex Mono, they didn’t even mention 3270 terminals.

And yet, IBM are using IBM Plex Mono to present 3270 terminal screen captures in IBM product docs! Apparently, without considering whether the font is fit for purpose; that is, whether it supports all of the required characters.

It doesn't. So, for some 3270 screens, documentation writers must resort to using bitmapped screen captures.

You've recently added box-drawing characters (issue #93). That's great, thanks!

However, as I commented in issue #93, box-drawing characters are only a subset of the APL characters that 3270 terminal screens can display. See EBCDIC code page 310 in the Wikipedia article "Digital encoding of APL symbols".

For more details, I recommend that you contact IBM. They are the 3270 experts, and they made the decision to use IBM Plex Mono to present 3270 screen captures in their product docs.

BoldMonday commented 2 years ago

@GrahamHannington thank you for the continued feedback on this issue.

The addition of APL symbols is on the list of future updates but there are still other priorities at the moment. As external contractors we are dependent on the budgets that are allocated to us by IBM. I’m sure you understand.

GrahamHannington commented 2 years ago

we are dependent on the budgets that are allocated to us by IBM

Sure, understood. I like what you've done with that budget; I like IBM Plex!

I feel bad using Noto Sans Mono for some 3270 screens when I'd prefer to be using IBM Plex Mono.

Re:

The addition of APL symbols

Just so that we're on the same, er, code page :wink:: the term "APL symbols" might mean different things to different people.

Here, I'm specifically requesting the characters in EBCDIC code page 310 (see the link in my original post).

In Unicode terms, some of the characters in that code page are characterized as APL functional symbols, some are not; some are in the "Miscellaneous Technical" Unicode block, some are not.

(I've just noticed issue #176, opened in 2018.)

GrahamHannington commented 2 years ago

To highlight the missing characters, I copied the HTML for that table of EBCDIC code page 310 from Wikipedia, and tweaked the CSS for the sample characters to font-family: "IBM Plex Mono", "Adobe NotDef", so that any characters not present in IBM Plex Mono would fall back to the Adobe NotDef glyph.

ebcdic-code-page-310-ibm-plex-mono

That's a whole lotta tofu. :wink:

ebcdic-code-page-310.html.zip

I attached a .zip of the HTML because the tooltips show the Unicode character names and code points.

GrahamHannington commented 2 years ago

Mousing over anonymous tofu gets tired real quick, so I tweaked the CSS some more, including this to expose the tooltips:

td::after {
  font-size: x-small;
  content: attr(title) " ";
}

ebcdic-code-page-310-ibm-plex-mono-unicode-characters.pdf

ebcdic-code-page-310-with-unicode-char-names.html.zip

GrahamHannington commented 2 years ago

EBCDIC code page 310 coverage: IBM Plex Mono versus Noto Sans Mono

2-page PDF, attached.

Hi @BoldMonday,

I'm not trying to rub it in (the difference in coverage); I actually thought you might find this useful.

Flipping between the two pages highlights the difference between the tofu in IBM Plex Mono versus the glyphs in Noto Sans Mono.

ebcdic-code-page-310-ibm-plex-mono-vs-noto-sans-mono.pdf

GrahamHannington commented 2 years ago

I've given you bad information

Hi @BoldMonday ,

I'm sorry.

I've previously referred you to a table in the Wikipedia article "Digital encoding of APL symbols".

That table maps 3270 characters (specifically, characters in EBCDIC code page 310) to Unicode characters.

:warning: I've just discovered that some of those mappings are incorrect. At the very least, incorrect in the context of IBM 3270 terminal displays.

Unfortunately, I don't have a direct, correct replacement table to offer you: that is, a table that shows the correct glyphs and corresponding Unicode code points.

Frankly, I'm still digesting this news myself.

Earlier this week, I saw a 3270 screen that contains the character with EBCDIC code page 310 byte value X'81', which IBM characterizes as "Double Vertical, Bar Graphic", GCGID SF630000.

That on-screen glyph is significantly different to the glyph shown in the table in Wikipedia.

The lines in the on-screen glyph are as far apart as possible, whereas the lines in the glyph in the table in Wikipedia are closely spaced.

This prompted me to add a section to the "Talk" page of that Wikipedia article, "Incorrect mapping of EBCDIC code page 310 (APL) to Unicode characters?"

Another Wikipedia user replied:

IBM actually maps SF630000 (the 0x81, double vertical one) to U+F892 in their corporate Private Use Area scheme, and SF620000 (the 0x82, double horizontal one) to U+F893, also in the Private Use Area (as seen in unicode.nam, included here). In terms of more recent additions to Unicode that the cited sources did not have the benefit of, ... (U+1FB80) in the Symbols for Legacy Computing block is a much closer match to the double horizontal one, but there is still no particularly good match to the double vertical one

That prompted me to do more research.

Better information

While I don't have a direct replacement for that table in Wikipedia, I can offer you:

For example, you can see from these tables that EBCDIC code page 310 byte value X'81' ("Double Vertical, Bar Graphic", GCGID SF630000) maps not to U+2551, but to the PUA code point U+F892, and that the lines in the glyph are spaced as far apart as possible, which is significantly different to U+2551.

Which 3270 characters are only in the IBM PUA?

I don't know.

Certainly, these two EBCDIC code page 310 byte values:

Do other characters in the character set (GCSGID 00963) for EBCDIC code page 310 also map to IBM PUA characters? I don't know. Given the available information, I think it's possible to answer this question, but I acknowledge that I'm currently not thinking clearly enough to work out an efficient method to do that. I need more coffee, or more sleep. 🙂

What does this mean for you, and for IBM Plex Mono?

To properly support 3270 screens, IBM Plex Mono will need to include characters in the IBM PUA.

Why bother? Why not just map to standard characters?

Mapping to "standard" vs "PUA" characters can significantly affect the appearance, even usability, of a 3270 screen.

Example: EBCDIC code page 310 byte value X'81':

When used as a table column separator, to distinguish, say, non-scrollable columns from scrollable columns, U+2551 gives characters in adjoining table cells some breathing space; U+F892 does not. Arguably, then, U+2551 is usable in this context, but not U+F892.

GrahamHannington commented 2 years ago

IBM PUA characters in EBCDIC code page 310

This is my current best effort at identifying the IBM PUA characters in EBCDIC code page 310:

EBCDIC code page 310 byte value (hex) GCGID GCGID name IBM PUA Unicode code point (U+)
55 LN480000 N Line Below Capital/N Underscore (APL) F8D7
56 LO480000 O Line Below Capital/O Underscore (APL) F8D5
57 LP480000 P Line Below Capital/P Underscore (APL) F8D3
58 LQ480000 Q Line Below Capital/Q Underscore (APL) F8D1
59 LR480000 R Line Below Capital/R Underscore (APL) F8CF
62 LS480000 S Line Below Capital/S Underscore (APL) F8CD
63 LT480000 T Line Below Capital/T Underscore (APL) F8CB
64 LU480000 U Line Below Capital/U Underscore (APL) F8C9
65 LV480000 V Line Below Capital/V Underscore (APL) F8C7
66 LW480000 W Line Below Capital/W Underscore (APL) F8C5
67 LX480000 X Line Below Capital/X Underscore (APL) F8C3
68 LY480000 Y Line Below Capital/Y Underscore (APL) F8C1
69 LZ480000 Z Line Below Capital/Z Underscore (APL) F8BF
80 SL460000 Tilde (APL) F88F
81 SF630000 Double Vertical,  Bar Graphic F892
82 SF620000 Double Horizontal,  Bar Graphic F893
85 SF660000 Center Vertical,  Bar Graphic F891
8A SL610000 Up Arrow (APL) F88B
8B SL620000 Down Arrow (APL) F88A
8F SL600000 Right Arrow (APL) F88C
9D SL080000 Circle (APL) F890
9F SL590000 Left Arrow (APL) F88D
A4 LN012000 n Small Subscript F8D8
B7 SL640000 Slope (APL) F889
DB SL580000 Quote Dot (APL) F88E

To create this table, I used Excel to correlate unicode.nam with CP00310.txt (ftp://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00310.txt).

I've yet to see any "Line Below Capital" characters on a 3270 screen. Then again, I've never programmed in APL.

I'm curious to know how we got here: Unicode contains characters for Ancient Greek Musical Notation, but not comprehensive support for all 3270 characters. I can imagine some reasons, but I'd be interested to know the real story.

GrahamHannington commented 2 years ago

Example 3270 screen containing all IBM PUA characters

I wrote:

I've yet to see any "Line Below Capital" characters on a 3270 screen.

This irked me.

Today, based on code provided to me by a vastly more experienced colleague, I wrote a z/OS REXX exec that dynamically generates a 3270 screen (specifically, an ISPF panel) that shows all of the PUA characters listed in my previous comment.

Here's an image of the screen:

ibm-pua-characters-3270-screen

The glyphs in white, in the first column, are provided by a proprietary (non-Unicode) font that is supplied with the terminal emulator. Everything else is set in IBM Plex Mono. The proprietary font doesn't necessary have the same font metrics (e.g. glyph widths) as IBM Plex Mono, but the method that the emulator uses to position the characters means that this doesn't matter; it doesn't affect the alignment of the screen contents.

I think that some, perhaps even most, of these characters could be mapped to existing equivalent characters in the Unicode standard. I'd like to know why IBM chose to map such characters to PUA code points instead of existing standard code points. Perhaps the answer is in the qualifier "existing"; perhaps IBM made that decision before such characters were in the standard. I'm just guessing. I'd really like to understand the history here. If you have that conversation with IBM, I'd be grateful if you share what you can.

I'm unaware of any Unicode font that includes all of these characters (at these PUA code points).

GrahamHannington commented 2 years ago

I wonder about: