Closed ascherer closed 2 years ago
As a first step, it might be possible to skip #c3
and use ecma94
to transliterate some umlauts.
Actually, the second bytes of UTF-8 characters (after #c3
) do not correspond to ecma94
, so a different transliteration table would be necessary. For the “usual suspects” of the German language, the following modification works:
@x
else C_printf("%s",translit[(unsigned char)(*j)-0200]);
@y
else {
if (flags['u']) C_putc(*j);
else {
if (0303==(unsigned char)(*j)) ++j;
C_printf("%s",translit[(unsigned char)(*j)-0200]);
}
}
@z
(skip #c3
and transliterate the next byte) and the resulting uctangle
processes the input file
@l 84 Ae
@l 96 Oe
@l 9c Ue
@l 9f ss
@l a4 ae
@l b6 oe
@l bc ue
@* Igor.
@c
int main(void)
{
int fröhlicheWeihnacht = 42;
int ätscheBÄH = 100;
int ÄÖÜßäöü = 666;
return 0;
}
@* Index.
into the expected output
/*1:*/
#line 11 "utest.w"
int main(void)
{
int froehlicheWeihnacht= 42;
int aetscheBAeH= 100;
int AeOeUessaeoeue= 666;
return 0;
}
/*:1*/
I have absolutely no idea if the above change breaks any legal CWEB input. A quick glance at the ISO-8859-1 table and some cross-calculation shows that also magic number #c2
might come into play.
This issue is related to issue #8.
I completely forgot about my own ideas expressed above. Only after watching both videos of @dylanbeattie's talk on “Plain Text” (NDC Oslo 2021, NDC Copenhagen 2022) it clicked with me. As the (partial) fix for issue #8 makes use of the +u
option in a different manner, I close this issue. CWEB is far too conservative to issue UTF-8. On my Linux box, gcc 9 can't grok UTF-8 identifiers anyway. (I have seen the future on my Mac Mini with CLang 13, though.)
Feature request and patch by @igor-liferenko.
Although I can apply the patch to
ctangle.w
and get the desired effect with the+u
option,gcc 9.3.0
on (K)Ubuntu 20.04 LTS can't cope with identifiers with UTF-8 characters. It appears that UTF-8 support comes withgcc 10
. Plus, there's a significant amount of spit and polish to be applied in order to integrate that small patch in the code base (test, doc, etc.).