Closed vimperatorluo closed 6 months ago
What is your tcl version?
What do you see here?
$ echo 'set s "中国"; puts [string length $s]\t[llength [split $s ""]]' | tclsh
2 2
$ echo 'set s "\u4E2D\u56FD"; puts [string length $s]\t[llength [split $s ""]]' | tclsh
2 2
(just to ensure it is not affected by encoding issue or some bugs like https://core.tcl-lang.org/tcl/info/debd088e48)
I am using vanillatclsh64 the byproduct of Androwish, I find this issue when I use tdbcsh, and I reported this at Anrowish(https://www.androwish.org/home/info/ddf8ad3d090b4107). chw suggest that I should report at this.
This is the tdbcsh output:
$ tdbcsh sqlite3 bidsman.db Please type SQL commands, or 'exit' or 'quit' to leave.
SQL> select from item ┌──────────┬──────┬────────┬─────────┬─────┬────────────────┬───────┬─────┬──────┬────────────────────────────────┬───────────────────────┐ │ name │better│refvalue│ unit │error│ score │penalty│zeyou│ enum │ hierarchy │ dscrp │ ├──────────┼──────┼────────┼─────────┼─────┼────────────────┼───────┼─────┼──────┼────────────────────────────────┼───────────────────────┤ │ 重量 │ <= │ 2.5 │ kg │ │ 0.5 │ 0 │ 否 │ │ │ │ ├──────────┼──────┼────────┼─────────┼─────┼────────────────┼───────┼─────┼──────┼────────────────────────────────┼───────────────────────┤ │ 齐套 │ == │ 齐套 │ │ │ 0.5 │ 0 │ 否 │齐套 不齐套│ │ │ ├──────────┼──────┼────────┼─────────┼─────┼────────────────┼───────┼─────┼──────┼────────────────────────────────┼───────────────────────┤ │ 硬件自主可控 │ >= │ 0 │型元器件不自主可控│ │=实测>5?0:3-实测0.5│ 0 │ 否 │ │ │每有1型元器件不自主可控扣0.
This is the test output use tclline:
$ tclline Welcome to vanillatclsh (Tcl 8.6.10 with TclReadLine 1.1) > set s "中国"; puts [string length $s]\t[llength [split $s ""]] 2 2 > set s "\u4E2D\u56FD"; puts [string length $s]\t[llength [split $s ""]] 2 2 > puts "[info patchlevel] -- [llength [split "\ud83e\udd1d" {}]] -- [string length "\ud83e\udd1d"]" 8.6.10 -- 1 -- 1 > puts "[info patchlevel] -- [llength [split "\ud83e\udd1d" {}]] -- [string length "\ud83e\udd1d"]" 8.6.10 -- 1 -- 1
------------------ 原始邮件 ------------------ 发件人: "dbohdan/sqawk" @.>; 发送时间: 2021年12月14日(星期二) 晚上8:45 @.>; @.**@.>; 主题: Re: [dbohdan/sqawk] tabulate can not correctly deal with Chinese character (Issue #17)
What is your tcl version?
What do you see here?
$ echo 'set s "中国"; puts [string length $s]\t[llength [split $s ""]]' | tclsh 2 2 $ echo 'set s "\u4E2D\u56FD"; puts [string length $s]\t[llength [split $s ""]]' | tclsh 2 2
(just to ensure it is not affected by encoding issue or some bugs like https://core.tcl-lang.org/tcl/info/debd088e48)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
There is mess in the tdbcsh output, I can't fix this, please see the Androwish issue: https://www.androwish.org/home/info/ddf8ad3d090b4107
Now I understand the issue - it has nothing with the wrong length, it has something with display of the char due to double width in the chosen fixed font (probably in any fixed font), so that although 中
is a single character, but uses 2 places by aligned drawing.
$ echo -e "1234 5678\n中国 中国" | tclsh tabulate.tcl -align 'left right'
┌────┬────┐
│1234│5678│
├────┼────┤
│中国 │ 中国│
└────┴────┘
Correct fix would expect some determination mechanism how wide is a uni-char in some font or using wcwidth
, which tcl does not have at all (possibly tk has something like that).
A workaround would be to replace any Chinese/Japanese/WhateverDoubleWidth chars with 2 chars by calculating its length, e. g. any wide chars from east-asian-width, considering ambiguous characters, as well as the unicode variation selector (U+FE0F).
Simplified "workaround" to illustrate the possible approach - #18
Your patch applied to AndroWish, Thank you very much.
------------------ 原始邮件 ------------------ 发件人: "dbohdan/sqawk" @.>; 发送时间: 2021年12月16日(星期四) 晚上8:23 @.>; @.**@.>; 主题: Re: [dbohdan/sqawk] tabulate can not correctly deal with Chinese character (Issue #17)
Now I understand the issue - it has nothing with the wrong length, it has something with display of the char due to double width in the chosen fixed font (probably in any fixed font), so that although 中 is a single character, but uses 2 places by aligned drawing. $ echo -e "1234 5678\n中国 中国" | tclsh tabulate.tcl -align 'left right' ┌────┬────┐ │1234│5678│ ├────┼────┤ │中国 │ 中国│ └────┴────┘
Correct fix would expect some determination mechanism how wide is a uni-char in some font or using wcwidth, which tcl does not have at all (possibly tk has something like that). A workaround would be to replace any Chinese/Japanese/WhateverDoubleWidth chars with 2 chars by calculating its length, e. g. any wide chars from east-asian-width, considering ambiguous characters, as well as the unicode variation selector (U+FE0F).
Simplified "workaround" to illustrate the possible approach - #18
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
One Chinese character occupy two English character space, example:
中国 1234
so the tabulate has problem with Chinese character