Open doggy8088 opened 6 months ago
Thank you for opening this issue, we will look into it.
I am able to repro with the latest PowerShell 7.4.1. My system locale is English (United States):
Printing to console is fine:
> az group show -n testrg
{
...
"tags": {
...
"key1": "测试"
},
...
}
But a warning is shown when redirecting:
> az group show -n testrg > out.txt
WARNING: Unable to encode the output with cp1252 encoding. Unsupported characters are discarded.
(Actually, I wrote that warning in https://github.com/microsoft/knack/pull/178.)
According to https://docs.python.org/3/library/sys.html#sys.stdout
sys.stdout
Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage).
So changing the console's encoding with [Console]::OutputEncoding = [Text.UTF8Encoding]::new()
won't affect Python's output encoding.
I would recommend changing your system encoding to UTF-8 (follow https://github.com/microsoft/knack/pull/178), so that you won't need to modify the az.cmd
entry script every time you update Azure CLI.
Changing the system encoding to UTF-8 is not an option for most of non-English locale people.
Changing the system encoding to UTF-8 is not an option for most of non-English locale people.
Can you explain why? My personal desktop computer is using UTF-8 as I need to display Chinese (Simplified, China).
I can verify Windows PowerShell 5.1 can't handle UTF-8 correctly:
> $PSVersionTable
Name Value
---- -----
PSVersion 5.1.22621.2506
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.22621.2506
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
> [Console]::OutputEncoding
IsSingleByte : True
BodyName : IBM437
EncodingName : OEM United States
HeaderName : IBM437
WebName : IBM437
WindowsCodePage : 1252
IsBrowserDisplay : False
IsBrowserSave : False
IsMailNewsDisplay : False
IsMailNewsSave : False
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : False
CodePage : 437
> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試
This can be fixed by setting [Console]::OutputEncoding = [Text.UTF8Encoding]::new()
:
> [Console]::OutputEncoding = [Text.UTF8Encoding]::new()
> [Console]::OutputEncoding
BodyName : utf-8
EncodingName : Unicode (UTF-8)
HeaderName : utf-8
WebName : utf-8
WindowsCodePage : 1200
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
IsSingleByte : False
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : False
CodePage : 65001
> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試
https://stackoverflow.com/a/78023334/2199657 mentions PowerShell 7.4 doesn't interpret the redirected data anymore.
PowerShell 7.4 changed the behavior of the redirection operators when used to redirect the stdout stream of a native command. The redirection operators now preserve the byte-stream data when redirecting output from a native command. PowerShell doesn't interpret the redirected data or add any additional formatting.
Simply calling python -X utf8
will work:
> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試
Same approach can be used to call Azure CLI:
> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -IBm azure.cli group show -n testrg > out.txt ; Get-Content out.txt
{
...
"tags": {
...
"key1": "测试測試"
},
...
}
Wait. As you are already using cp950
which is big5: ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)
according to https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers, I guess you are trying to parse characters not in cp950
. May I know the original Chinese character that is causing problem?
I'm okay with the cp950
in both Windows PowerShell or PowerShell 7+.
It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console. That's why I need az.cmd
to output UTF-8 by default.
It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console.
I fail to understand the relationship between Oh-My-Posh and encoding. Could you give more context on this? I don't think it is Oh-My-Posh that causes the encoding error. May I know the original Chinese character that is causing problem?
It doesn't matter what original Chinese character are. All Chinese characters will be truncated from the output.
For your confusing, it because Oh-My-Posh
can define special unicode font to display symbols on the prompt, like this:
So that my Console output encoding must be in UTF-8 encoding. Let's why I don't set cp950
on the Console.
I don't think this got anything to do with Oh-My-Posh when redirection is involved. Without redirection, like a pure az account list
, the output is indeed in UTF-8.
https://docs.python.org/3/library/sys.html#sys.stdout
On Windows, UTF-8 is used for the console device.
> python -c "import sys; print(sys.stdout.encoding)"
utf-8
In your original screenshot, Azure CLI is trying to encode its output with cp950, but certain characters can't be encoded by cp950 showing as "unsupported":
Besides Azure CLI, you can repro this issue with Python:
> python -c "print('测试')" > out.txt
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\jiasli\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>
> python -c "import sys; print(sys.stdout.encoding)" > out.txt ; Get-Content out.txt
cp1252
Here is my test:
Don't you think this is issue is absurd in 2024? The windows region settings require administrative rights, so not everyone can change it, and the text says
... language (system locale) to use when displaying text in programs that do not support Unicode
emphasis mine
Is Python or AZ CLI a program that doesn't support Unicode? I don't think so. Is AZ CLI supposed to work with non-unicode tools? I don't suppose so, even the damn Notepad in Windows defaults to unicode these days. The Knack even specifically forces UTF8 on logfile output. Why not on stdout?
I don't think they really understand our pain points. For decades.
Describe the bug
I was reported a bug on StackOverflow: https://stackoverflow.com/q/78008939/910074
When I have to use
UTF-8
for my default console output encoding ([Console]::OutputEncoding
), the Azure CLI unable to handle Chinese characters because Encoding issue. It cause either Chinese chars missing or produce messy code.Related command
$(az account list -o json)
az account list -o json | jq '.'
Errors
Issue script & Debug output
It's an encoding issue.
Expected behavior
I expected Azure CLI can handle Chinese characters correctly.
Environment Summary
azure-cli 2.57.0
core 2.57.0 telemetry 1.1.0
Extensions: account 0.2.3 azure-devops 0.25.0 front-door 1.0.16 interactive 0.4.5 k8s-extension 1.2.4 managementpartner 0.1.3
Dependencies: msal 1.26.0 azure-mgmt-resource 23.1.0b2
Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe' Extensions directory 'C:\Users\wakau.azure\cliextensions'
Python (Windows) 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]
Legal docs and information: aka.ms/AzureCliLegal
Your CLI is up-to-date.
Additional context
I have a workaround by now. Just edit
C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd
file. Add-X utf8
to the python arguments.