Closed ryanbugden closed 10 months ago
The unicode data for LATIN CAPITAL LETTER SHARP S does not list a lowercase. You can check Lib/glyphNameFormatter/data/flatUnicode.txt
#unicode 12
lowercase = "00DF 0053 0053 Ll LATIN SMALL LETTER SHARP S"
uppercase = "1E9E 00DF Lu LATIN CAPITAL LETTER SHARP S"
lines = [lowercase, uppercase]
for line in lines:
uniNumber, uniUppercase, uniLowercase, uniCategory, mathFlag, uniName, = line.split("\t")
print(f"uniNumber:{uniNumber}, uniUppercase:{uniUppercase}, uniLowercase:{uniLowercase}, uniName:{uniName}")
uniNumber:00DF, uniUppercase:0053 0053, uniLowercase:, uniName:LATIN SMALL LETTER SHARP S
uniNumber:1E9E, uniUppercase:, uniLowercase:00DF, uniName:LATIN CAPITAL LETTER SHARP S
This release is built with unicode 12. I will have a look and see if the casing is different in a newer edition.
Unicode 15 offers the same data. Unless there is better source for capitalisation.
00DF 0053 0053 Ll LATIN SMALL LETTER SHARP S
1E9E 00DF Lu LATIN CAPITAL LETTER SHARP S
Python follows the unicode rules.
Germandbls_unicode = 7838
germandbls_unicode = 223
print(chr(germandbls_unicode), chr(germandbls_unicode).upper())
print(chr(Germandbls_unicode), chr(Germandbls_unicode).lower())
ß SS
ẞ ß
Checked with the "big list"
<ucd xmlns="http://www.unicode.org/ns/2003/ucd/1.0">
<description>Unicode 15.0.0</description>
...
<char cp="00DF"
na="LATIN SMALL LETTER SHARP S"
uc="0053 0053"
...
<char cp="1E9E"
na="LATIN CAPITAL LETTER SHARP S"
lc="00DF"
This is confirmed by https://www.unicode.org/charts/PDF/U0080.pdf
LATIN SMALL LETTER SHARP S
• German
• not used in Swiss High German
• uppercase is “SS” (standard case mapping),
alternatively 1E9E ẞ
So according to Unicode, 00DF is the lowercase of 1E9E. But the capital version of 00DF is still 0053 0053 (SS). I don't know why Python is so confident about its casing.
n2N("germandbls")
> germandbls
N2n("Germandbls")
> germandbls
Thanks for looking into this. Do you think this is something I should petition on a Unicode level or they've already taken a solid stance to prioritize SS?
I understand it is not just an oversight. Unicode reflects the current use: ß is not expected to automatically capitalise to ẞ. The other way around is not problematic.
When converting between upper and lowercase,
germandbls
gets a bit tripped up. Here's a test script: