Closed Chromowolf closed 9 months ago
See https://github.com/Buizz/EUD-Editor-3/issues/36
tl;dr: stat_txt.tbl
strings can be CP949, UTF-8 or windows-1252. CP949 has highest precedence.
https://github.com/armoha/eudplib/blob/master/eudplib/eudlib/stringf/tblprint.py#L32
f_settbl
always converts "string literal"
inputs to CP949.
settbl("Terran Ghost", 0, ("生命\u2009").encode("UTF-8"));
Passing bytes
rather than string would work. 'Thin space' character (U+2009, e2 80 89
in UTF-8) in ending ensures tbl to not decodable with CP949, forcing it to always intepret as unicode.
Without Thin space character, 生命 in utf-8 (e7 94 9f e5 91 bd
) is decoded with windows-1252, displayed as 生命.
function onPluginStart() {
settbl("Terran Ghost", 0, ("生命\u2009").encode("UTF-8")); //Error
//dbstr_print(GetTBLAddr("Terran Vulture"), "生命\u2009\x00"); //OK
}
[epScript] Compiling "main.eps"... [Error -2] Module "main" Line 2 : General syntax error [Error 6298] Module "main" Line 4 : Block not terminated properly.
but dbstr_print(GetTBLAddr("Terran Vulture"), "生命\u2009\x00");
is OK
BTW, euddraft doesn't seem to have f_encode ?
py_str("生命\u2009").encode("UTF-8")
would work then ;(
It works. Thanks a lot!!!! So, it turns out that: Starcraft uses "utf-8" as the highest priority when decoding strings in string section, but uses "cp949" as the highest priority when decoding strings in stat_txt.tbl? This inconsistency is so weird :(
Starcraft uses "utf-8" as the highest priority when decoding strings in string section, but uses "cp949" as the highest priority when decoding strings in stat_txt.tbl?
Yeah exactly xD IMO it's because back to 1.16 eud maps, tbl editting (CP949) was so common that SC:R had to prioritize to support them. In contrast, only few map editted STR content in-game so SC:R could move on to unicode, breaking little number of maps.
Thank you. (I should have asked you this question 2 years ago, lol.) Our group of map makers are very grateful to your help. Is there any method we could sponsor/donate to you? (Like patreon or any other means.)
What about if I wanna use settblf
instead?
const ss = Db("生命");
function onPluginStart() {
//sprintf(GetTBLAddr("Terran Ghost"), "{:s}\u2009", ss); // I know this is OK
settblf("Terran Ghost", 0, py_str("生命\u2009").encode("UTF-8")); //Got error: expected str, got bytes
}
What about if I wanna use settblf instead?
const sm = EPD(Db("生命")); // f_settblf, f_settblf2(tbl, offset, format_string, *args) settblf("Terran Ghost", 0, "{:t}\u2009", sm);
Our group of map makers are very grateful to your help. Is there any method we could sponsor/donate to you? (Like patreon or any other means.)
Thank you for support! I opened my BuyMeACoffee just now. https://www.buymeacoffee.com/armoha
const sm = EPD(Db("生命")); // f_settblf, f_settblf2(tbl, offset, format_string, *args) settblf("Terran Ghost", 0, "{:t}\u2009", sm);
UnicodeEncodeError: 'cp949' codec can't encode character '\u2009' in position 2: illegal multibyte sequence.
I don't know how to force settblf
to encode the format string in utf-8, cuz py_str
can't apply here.
const sm = EPD(Db("生命"));
// f_settblf, f_settblf2(tbl, offset, format_string, *args)
const unicode_tbl = py_str("\u2009").encode("UTF-8");
settblf("Terran Ghost", 0, "{:t}{}", sm, unicode_tbl);
I think I should add settblf(encoding="utf-8");
option..
Thank you. This is a work-around, but looks awkward, and seems to lose the convenience of format string... If the format string is "some utf-8 char{:c} utf-8 char{:s}, utf8 char xxx {:n}", then I must do
const uni01 = py_str("some utf-8 char ").encode("UTF-8");
const uni02 = py_str(" some utf-8 char").encode("UTF-8");
const uni03 = py_str(" utf8 char xxx ").encode("UTF-8");
const uni = py_str("\u2009").encode("UTF-8");
settblf("Terran Ghost", 0, "{}{:c}{}{:s}{}{:n}{}", uni01, playerID, uni02, someAddr, uni03, playerID, uni);
which takes the same effort as
settbl("Terran Ghost", 0, "some utf-8 char", playerID, " utf-8 char", someAddr, " utf8 char xxx ", playerID, "\u2009");
And you know this for sure....
So I think the encoding="utf-8"
is necessary if there is no other work-around.
(Hope one day the world could be unified to utf-8)
@Chromowolf Updated to 0.9.1.4 https://github.com/armoha/euddraft/releases/tag/v0.9.1.4
settblf("Terran Ghost", 0, "{0:c}生命{0:n}", player, encoding="utf-8");
// write "<playerColor>生命<playerName>\u2009\0" on Terran Ghost tbl
f_settbl
: Added encoding parameter (default: "CP949")
f_settbl(tbl, offset, *args, encoding="cp949")
f_settblf(tbl, offset, format_string, *args, encoding="cp949")
encoding
specifies which encoding str
arguments will use.
When encoding
is "utf-8", f_settbl
or f_settblf
appends "\u2009\0" at end of tbl string, to ensure SC:R to always interpret as unicode entry.
(Partial edit functions f_settbl2
, f_settblf2
do not add any null terminator or thin space character.)
It is user's responsibility to use same encoding in other types of arguments; bytes
, Db
etc.
f_settbl(tbl, offset, *args, encoding="cp949")
function onPluginStart() {
settbl(1, 0, "abc", encoding = "cp949");
}
euddraft 0.9.1.4 : Simple eudplib plugin system - This program follows MIT License. See license.txt - Press SHIFT to force check update while opening euddraft. - Daemon mode. Ctrl+C to quit. R to recompile (windows only) ... ... [Error -2] Module "main" Line 2 : General syntax error [Error 6298] Module "main" Line 3 : Block not terminated properly.
:( settbl(1, 0, "abc", encoding=py_str("cp949"));
in epScript
Thx. Sorry for this stupid question :P
Nah it's my fault in documenting, forgot A = "B"
pattern haven't been allowed yet in epScript.
Closes as completed, please re-open or open new issue if you have any question.
FYI: from euddraft 0.9.9.9, [dataDumper] plugin detects whether binary data is encoded by CP949
or UTF-8
, and send this info to eudplib.
Related commits: https://github.com/armoha/euddraft/commit/e6dcc9b974e67792c25933e4f781c445a2b66d7f and https://github.com/armoha/eudplib/commit/e2f148cac4c07b2e2eb768332f5c880d9c64c0c4
(using euddraft 0.9.1.2) main.eps:
I got this:
My conclusion: SC:R uses only cp949 to decode strings in TBL, and won't use utf-8.
Is my conclusion right? Is there anything euddraft can do to let SC use utf-8 to decode strings TBL?