Maximus5 / ConEmu

Customizable Windows terminal with tabs, splits, quake-style, hotkeys and more
https://conemu.github.io/
BSD 3-Clause "New" or "Revised" License
8.61k stars 574 forks source link

"Unicode ranges" is truncated to 1024 characters. #1252

Closed tomonic-x closed 7 years ago

tomonic-x commented 7 years ago

Versions

ConEmu build: 170807 x64 OS version: Windows Windows 10 x64 1703 15063.608

Problem description

I want to treat the character whose Unicode East_Asian_Width property is "Amibiguous" as "Fullwidth". For example, ,, Ω, Д.

So, I has generated "Unicode ranges" including "Fullwidth", "Wide", "Ambiguous" and "Halfwidth" of "Unicode ranges".

The generated "Unicode ranges" is as follows.

CJK with Ambiguous: a1;a4;a7-a8;aa;ad-ae;b0-b4;b6-ba;bc-bf;c6;d0;d7-d8;de-e1;e6;e8-ea;ec-ed;f0;f2-f3;f7-fa;fc;fe;101;111;113;11b;126-127;12b;131-133;138;13f-142;144;148-14b;14d;152-153;166-167;16b;1ce;1d0;1d2;1d4;1d6;1d8;1da;1dc;251;261;2c4;2c7;2c9-2cb;2cd;2d0;2d8-2db;2dd;2df;300-36f;391-3a1;3a3-3a9;3b1-3c1;3c3-3c9;401;410-44f;451;1100-115f;2010;2013-2016;2018-2019;201c-201d;2020-2022;2024-2027;2030;2032-2033;2035;203b;203e;2074;207f;2081-2084;20a9;20ac;2103;2105;2109;2113;2116;2121-2122;2126;212b;2153-2154;215b-215e;2160-216b;2170-2179;2189;2190-2199;21b8-21b9;21d2;21d4;21e7;2200;2202-2203;2207-2208;220b;220f;2211;2215;221a;221d-2220;2223;2225;2227-222c;222e;2234-2237;223c-223d;2248;224c;2252;2260-2261;2264-2267;226a-226b;226e-226f;2282-2283;2286-2287;2295;2299;22a5;22bf;2312;2329-232a;2460-24e9;24eb-254b;2550-2573;2580-258f;2592-2595;25a0-25a1;25a3-25a9;25b2-25b3;25b6-25b7;25bc-25bd;25c0-25c1;25c6-25c8;25cb;25ce-25d1;25e2-25e5;25ef;2605-2606;2609;260e-260f;2614-2615;261c;261e;2640;2642;2660-2661;2663-2665;2667-266a;266c-266d;266f;269e-269f;26be-26bf;26c4-26cd;26cf-26e1;26e3;26e8-26ff;273d;2757;2776-277f;2b55-2b59;2e80-2e99;2e9b-2ef3;2f00-2fd5;2ff0-2ffb;3000-303e;3041-3096;3099-30ff;3105-312d;3131-318e;3190-31ba;31c0-31e3;31f0-321e;3220-32fe;3300-4dbf;4e00-a48c;a490-a4c6;a960-a97c;ac00-d7a3;e000-faff;fe00-fe19;fe30-fe52;fe54-fe66;fe68-fe6b;ff01-ffbe;ffc2-ffc7;ffca-ffcf;ffd2-ffd7;ffda-ffdc;ffe0-ffe6;ffe8-ffee;fffd;1b000-1b001;1f100-1f10a;1f110-1f12d;1f130-1f169;1f170-1f19a;1f200-1f202;1f210-1f23a;1f240-1f248;1f250-1f251;20000-2fffd;30000-3fffd;e0100-e01ef;f0000-ffffd;100000-10fffd;

Steps to reproduce

  1. Input to [Settings] - [Main] - [Alternative font] - [Unicode ranges].
  2. Click [Apply].
  3. Click [Save settings].

Actual results

Reopening the settings dialog, "Unicode ranges" was truncated to 1024 characters. "FarBordersRanges" in ConEmu.xml has been truncated as well.

It is as follows.

00A1;00A4;00A7-00A8;00AA;00AD-00AE;00B0-00B4;00B6-00BA;00BC-00BF;00C6;00D0;00D7-00D8;00DE-00E1;00E6;00E8-00EA;00EC-00ED;00F0;00F2-00F3;00F7-00FA;00FC;00FE;0101;0111;0113;011B;0126-0127;012B;0131-0133;0138;013F-0142;0144;0148-014B;014D;0152-0153;0166-0167;016B;01CE;01D0;01D2;01D4;01D6;01D8;01DA;01DC;0251;0261;02C4;02C7;02C9-02CB;02CD;02D0;02D8-02DB;02DD;02DF;0300-036F;0391-03A1;03A3-03A9;03B1-03C1;03C3-03C9;0401;0410-044F;0451;1100-115F;2010;2013-2016;2018-2019;201C-201D;2020-2022;2024-2027;2030;2032-2033;2035;203B;203E;2074;207F;2081-2084;20A9;20AC;2103;2105;2109;2113;2116;2121-2122;2126;212B;2153-2154;215B-215E;2160-216B;2170-2179;2189;2190-2199;21B8-21B9;21D2;21D4;21E7;2200;2202-2203;2207-2208;220B;220F;2211;2215;221A;221D-2220;2223;2225;2227-222C;222E;2234-2237;223C-223D;2248;224C;2252;2260-2261;2264-2267;226A-226B;226E-226F;2282-2283;2286-2287;2295;2299;22A5;22BF;2312;2329-232A;2460-24E9;24EB-254B;2550-2573;2580-258F;2592-2595;25A0-25A1;25A3-25A9;25B2-25B3;25B6-25B7;25BC-25BD;25C0-25C1;25C6-25C8;

Expected results

Valid range (0 to 0xFFFF ?) is not truncated.

The cause

I think this is the cause. size_t nMax = 1024; at line 4209. https://github.com/Maximus5/ConEmu/blob/3789d175287f2547418476f45a985ce34e0dfb34/src/ConEmu/Options.cpp#L4206-L4230

About twice the length is necessary for my "Unicode ranges".

tomonic-x commented 7 years ago

Since it turned out that there was a lot of difference between the actual font width used and the Unicode East Asian Width property, I decided to change the method and reference the glyph metrics of the actual font.

As a result, I was able to generate Unicode ranges that fits in 1024 characters.

Example

The following is an example of a combination of a specific font and a character range for each language.

For U+2500-257F (Box Drawing), full-width characters on the alternative font side are not included so that they do not shift with ncurses, pstree or the like.

For U+10000 or higher, decide without reference to metrics, CJK Extension B - F and Variation Selectors Supplement etc are converted into surrogate pair and included.

for Simplified Chinese (Consolas + Microsoft YaHei) 340 characters

007F-009F;1E3F;2018-2019;201C-201D;2025;2027-202E;2035-2038;203B;2103-2104;2109-2112;2121;215F-2190;2192;2196-21A7;2208-220E;221D;221F;2223-2224;2227;222A;222E-2233;2235;2237-223C;224C-2251;2266;226E;2295-22BE;2312-231F;23FF-24FF;2500-257F;2580-2FDF;2FF0-3098;309A-A4CF;D81B-D822;D82C;D840-D87F;DB40;DC00-DFFF;F900-FAFF;FE00-FE1F;FE30-FE6F;

for Traditional Chinese (Consolas + Microsoft JhengHei) 330 characters

02C7;02CA-02CB;02D9;2018-2019;201C-201D;2025;2027-202E;2035-2038;203B;2103-2104;2109-2112;2116;2121;215F-2193;2196-21A7;21B8-21E6;2216-2217;221F;2223-2224;2229-222A;222E-2233;2235;2252-225F;2263;2266;2295-22BE;2307-230F;2383-24FF;2500-257F;2580-2FDF;3000-A4CF;D81B-D822;D82C;D840-D87F;DB40;DC00-DFFF;F900-FAFF;FE00-FE1F;FE30-FE6F;

for Japanese (Consolas + Yu Gothic) 580 characters

00A7-00A8;00B0-00B1;00B4;00B6;00D7;00F7;0336;0386;0388-03D1;03D5;03DB;0401-044F;0451-045C;045E-045F;09F2;17DB-1CFF;2003;2010;2014-2016;2020-2021;2025-202E;2030-2038;203B-203C;203F-2042;2047-205D;20DD-2116;2121;2127-212D;2135-214C;2150-2152;2156-215A;2160-2182;2189-2194;2196-21A7;21C4-2205;2207-2208;220A-220E;2211-2214;221A-2224;2226-223B;223D-2244;2252-2261;2266-2275;2277-2283;2285-2289;228B-229D;22A0-22D9;22DB-2301;2305;2307-230F;2312-2317;2329-23FE;23FF-24FF;2500-257F;2580-3098;309A-4DBF;4E00-9FFF;D82C;D840-D87F;DB40;DC00-DFFF;F900-FB00;FB03;FE00-FE1F;FE30-FE6F;FF00-FFEF;

for Korean (Consolas + Malgun Gothic) 330 characters

1100-11FF;2015;2025;203B;2047;2049-205D;2103-2104;2109-2112;2121;212B-212D;2160;2162;2164;2166;2168;2170;2172;2174;2176;2178;2196;2198;21D2-21D3;2200-2201;2203-2205;2207;220B-220E;221D;2220-2224;2227;222A;222C-222D;2234;223C;2252-225F;226A;2282;2286;2299-22A4;2312-231F;23FF-24FF;2500-257F;2580-2EFF;3000-3007;3009;300B;300D;300F;3011-3013;3015-302D;302F-4DBF;4E00-9FFF;A960-A97F;AC00-D7FF;D840-D87F;DB40;DC00-DFFF;F900-FAFF;FE00-FE1F;FE30-FE6F;FF00-FFEF;

Thank you !