adafruit / Adafruit_SSD1306

Arduino library for SSD1306 monochrome 128x64 and 128x32 OLEDs
http://www.adafruit.com/category/63_98
Other
1.75k stars 964 forks source link

Allow print a string with some umlauts characters. #190

Open matinicolosi opened 3 years ago

matinicolosi commented 3 years ago

I added a function that allow print a string with some umlauts characters.

mzero commented 3 years ago

This code does not achieve the desired result in a reasonable way, and makes the API confused.

Background

The Adafruit_GFX library (not this library) is responsible for mapping characters in strings to glyphs. String does not carry information about the encoding (the interpretation of bytes into Unicode code points). None of the other common C++ string types, do either. This means that code must agree otherwise on the encoding.

Adafruit_GFX uses Code Page 437 as the encoding. As such, it correctly maps bytes that encode umlauted characters to the proper glyph, and displays correctly. It handles all the umlauted characters mentioned in this patch.

Adafruit_GFX's API requires that strings passed to it are encoded in CP437[*].

The Proposed Code

The proposed code is a change to Adafruit_SSD1306. But this library is just the display driver for a particular set of displays. The code code for character interpretation is in Adafruit_GFX, and any change for character encodings belongs there.

As you can see from the proposed code, its aim is to convert umlauted characters into their Code Page 437 encoded equivalents. For example:

  result.replace("ä", "\204");

The string "\204" does indeed encode an umlauted a in CP 437. But, what character set is "ä" compiled to? In C++, this is the "execution encoding"... which for GCC is the ABI default... which for ARM is... not well defined, but the the ARM compiler uses ISO 8859-1 (Latin-1) - so we'll assume that. But note: The compilation can override it, so it isn't always fixed. It could easily be UTF-8 in some environments.

Thus, the proposed code amounts to a partial re-encoding of a String from ISO 8859-1 to CP 437. This means also that strings passed to printTextWithUmlauts are expected to be in a different encoding than all other functions used with Adafruit_GFX.

Suggestion

This code doesn't belong in this library (which deals only with a specific display driver), nor in Adafruit_GFX. These libraries are clear in that they deal with strings in CP 437.

In general - there is no presumption that arbitrary String values, (or other C++ string values) are going to be in ISO 8859-1, even though it is a common encoding on the internet - it is by no means the only. Programs using Adafruit_GFX (as well as this library) - must be aware of the encoding of the data they have, and convert to Adafruit_GFX's CP 437 expectation, if needed.

The proposed code is useful for someone who handles ISO 8859-1 encoded data, but it should placed in a library on it's own, with just the function of converting from ISO 8859-1 to CP 437. If done this way, it is more generally useful, doesn't muddy the API of these libraries, and could do a more complete encoding conversion.


[*] There is a well known bug in Adafruit_GFX that original character tables had a bug which made the encoding wrong above 0xB0. Too much code already exists in the world relying on the broken encoding. Calling the function display.cp437() causes correct CP 437 implementation.