Closed gh0stwizard closed 2 months ago
When a windows user has "Beta: Use Unicode UTF-8..." check box be checked, it causes problems.
So, I am not going to change the requirement that the windows "Beta: Use Unicode UTF-8..." check box be unchecked.
Curious, what different output do you see/get when you run it with chcp 65001 as above then without? Do you have "Beta: Use Unicode UTF-8..." check box be checked?
chcp 65001
does exactly the same things as when the checkbox "Beta: Use Unicode UTF-8..." is checked.In the end, I am using AMUMSS since 4.2.1.4 version. I have added to the top of BUILDMOD_AUTO.bat
the command chcp 65001
. Until this very moment I have ZERO issues with utf-8 on my Windows 10. And I see no real technical reason why utf-8 can't be used to print out unicode characters on the screen in AMUMSS.
I don't forcing you to enable utf-8 output by default. I am asking to stop blocking it as was done in AMUMSS v4.5.6.0W
, because anyone, who want to see unicode characters, must perform changes in bzrunM.bat
and comment the block if [%_CodePage%]==[65001] (... pause exit)
.
Again, the logic is simple. A concept code:
rem AT THE TOP OF THIS FILE FORCE USING SAFE CHARSET
chcp 850
rem HERE GOES OUR CODE WHICH AFFECTED BY CHARSET
rem ...
rem HERE WE RUN EXTERNAL PROGRAM (LUA) WHICH MAY PRINT UTF-8 CHARACTERS
rem ENABLE FOR THIS VERY MOMENT UNICODE IN CONSOLE
chcp 65001
lua.exe somescript.lua
rem SET BACK SAFE ENCODING
chcp 850
rem REST OF THE PROGRAM CONTINUE WORKING WITH SAFE CHARSET
rem ...
rem END OF FILE
exit
Still want to see:
Curious, what different output do you see/get when you run it with chcp 65001 as above then without?
When I add chcp 65001 like you suggest, I do not see any difference. So, for me, it only adds a problem for some other users that cannot use AMUMSS when 65001 is active AND no obvious plus otherwise...
Okay. One day you will learn a simple thing: it's better provide opportunities than blocking existing possibilities. Close the ticket and forget about it.
Why not answer my question? `Curious, what different output do you see/get when you run it with chcp 65001 as above then without?
Holy molly. Guys, learn a bit about utf-8, please.
Find any unicode characters. Here 1shot google regexp: [^\x00-\x7F]+
. Put it in VS Code search over exml files of the game. Here is an example output from LANGUAGE\NMS_LOC1_ENGLISH.EXML
:
<Property name="Id" value="BUI_ATLAS" />
<Property name="English" value="At1αs" />
Try to print out unicode string above, "At1αs". Compare results when chcp 850
and when chcp 65001
.
Plus, you have to configure your terminal/console (cmd.exe
) to use unicode-friendly font, for instance, Consolas
. Otherwise, you would not see a difference.
Curious, what different output do you see/get when you run it with chcp 65001 as above then without?
I see non-unicode characters. Awkward ASCII-characters. It's normal behavior when output charset does not match input one's, e.g. your lua program prints out utf-8 characters, but cmd.exe
expects as input local charset (850 or any legacy single-byte charset, 1250, etc).
You do not need to bring molly into this, we are talking...
My cmd.exe IS configured to use Consolas AND I still do not see a difference when I insert chcp 65001 or not. I do not care about VS Code, I use Notepap++.
In Notepad++: With encoding ANSI, it shows each byte representation:
<Property name="Id" value="BUI_ATLAS" /> <Property name="English" value="At1αs" />
With encoding utf-8, it looks like what I use in Notepad++ and your example:
<Property name="English" value="At1αs" />
BUT, THIS is coming from the output of MBINCompiler.exe itself, not something AMUMSS does to the EXML nor the MBIN file. Look at those files in a Hex Editor of your choice... The bytes are "41 74 31 CE B1 73" ( which shows as "A t 1 Î ± s" ). This byte sequence is in both the EXML AND the MBIN files. It is as HG created it! Maybe you could take it up with them?
Anyway, the real point is this: If 65001 was allowed in a normal AMUMSS installation, then some users (that I had to help figure the problem out) will not be able to use AMUMSS. So AMUMSS needs to flag it and request a correction (as it does right now) to allow everyone to use it.
If you really want to use 65001, just go ahead and add "chcp 65001" where you want it. The source code is there for you.
And like you said: cmd.exe expects as input local charset (850 or any legacy single-byte charset, 1250, etc).
Hope this help a bit
cmd.exe
is too smart to test unicode in its console (by typing in).
Here is a working example. Create a file utf8test.bat
, it should be encoded as UTF8 (not ANSI):
echo off
echo ---------------------
chcp 850
echo CAFÉ
echo ---------------------
chcp 65001
echo CAFÉ
echo ---------------------
Then run in it in cmd:
C:\Users\SECRET>utf8test.bat
C:\Users\SECRET>echo off
---------------------
Active code page: 850
CAFÉ
---------------------
Active code page: 65001
CAFÉ
---------------------
C:\Users\SECRET>
I will not connect with HG, because the question is trivial. As I said, lookup at the very first line of any EXML:
<?xml version="1.0" encoding="utf-8"?>
And this is not a joke. The XML is very strict format. When has been said encoding="utf-8"
, it means that XML will and must use utf-8
to parse files and save values into these files.
I hope, you have found out this already. Otherwise, I may only describe this issue as:
A NEW GALAXY DISCOVERED
UTF-8
:) Happy coding!
You too!
This is a proof of concept. In the file
bzrunM.bat
notice chcp commands below:Now, instead of hardcoded 65001 above, put an option variable, like
%_OutputCodePage%
. Make it changeable inBUILDMOD{_AUTO}.bat
as any other option, for instance,-UseLuaScriptInPak ASK
. The value%_CodePage%
above may be used on your own wish, even hardcoded to cp850, cp437 or gathered from system. In the end, an end user will see utf-8 output on screen and the rest of the code will working as expected.