Azure / azure-cli

Azure Command-Line Interface
MIT License
3.97k stars 2.95k forks source link

Consider use UTF-8 by default for Azure CLI #28497

Open doggy8088 opened 6 months ago

doggy8088 commented 6 months ago

Describe the bug

I was reported a bug on StackOverflow: https://stackoverflow.com/q/78008939/910074

When I have to use UTF-8 for my default console output encoding ([Console]::OutputEncoding), the Azure CLI unable to handle Chinese characters because Encoding issue. It cause either Chinese chars missing or produce messy code.

Related command

$(az account list -o json)

az account list -o json | jq '.'

Errors

image

Issue script & Debug output

It's an encoding issue.

Expected behavior

I expected Azure CLI can handle Chinese characters correctly.

Environment Summary

azure-cli 2.57.0

core 2.57.0 telemetry 1.1.0

Extensions: account 0.2.3 azure-devops 0.25.0 front-door 1.0.16 interactive 0.4.5 k8s-extension 1.2.4 managementpartner 0.1.3

Dependencies: msal 1.26.0 azure-mgmt-resource 23.1.0b2

Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe' Extensions directory 'C:\Users\wakau.azure\cliextensions'

Python (Windows) 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Additional context

I have a workaround by now. Just edit C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd file. Add -X utf8 to the python arguments.

::
:: Microsoft Azure CLI - Windows Installer - Author file components script
:: Copyright (C) Microsoft Corporation. All Rights Reserved.
::

@IF EXIST "%~dp0\..\python.exe" (
  SET AZ_INSTALLER=MSI
  "%~dp0\..\python.exe" -X utf8 -IBm azure.cli %*
) ELSE (
  echo Failed to load python executable.
  exit /b 1
)
yonzhan commented 6 months ago

Thank you for opening this issue, we will look into it.

jiasli commented 6 months ago

I am able to repro with the latest PowerShell 7.4.1. My system locale is English (United States):

image

Printing to console is fine:

> az group show -n testrg
{
  ...
  "tags": {
    ...
    "key1": "测试"
  },
  ...
}

But a warning is shown when redirecting:

> az group show -n testrg > out.txt
WARNING: Unable to encode the output with cp1252 encoding. Unsupported characters are discarded.

(Actually, I wrote that warning in https://github.com/microsoft/knack/pull/178.)

According to https://docs.python.org/3/library/sys.html#sys.stdout

sys.stdout Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage).

So changing the console's encoding with [Console]::OutputEncoding = [Text.UTF8Encoding]::new() won't affect Python's output encoding.

I would recommend changing your system encoding to UTF-8 (follow https://github.com/microsoft/knack/pull/178), so that you won't need to modify the az.cmd entry script every time you update Azure CLI.

Also see: https://github.com/python/cpython/issues/74595

doggy8088 commented 6 months ago

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

jiasli commented 6 months ago

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

Can you explain why? My personal desktop computer is using UTF-8 as I need to display Chinese (Simplified, China).

image

jiasli commented 6 months ago

I can verify Windows PowerShell 5.1 can't handle UTF-8 correctly:

> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.22621.2506
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.22621.2506
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

> [Console]::OutputEncoding

IsSingleByte      : True
BodyName          : IBM437
EncodingName      : OEM United States
HeaderName        : IBM437
WebName           : IBM437
WindowsCodePage   : 1252
IsBrowserDisplay  : False
IsBrowserSave     : False
IsMailNewsDisplay : False
IsMailNewsSave    : False
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 437

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

This can be fixed by setting [Console]::OutputEncoding = [Text.UTF8Encoding]::new():

> [Console]::OutputEncoding = [Text.UTF8Encoding]::new()

> [Console]::OutputEncoding

BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

https://stackoverflow.com/a/78023334/2199657 mentions PowerShell 7.4 doesn't interpret the redirected data anymore.

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_redirection?view=powershell-7.4#redirecting-output-from-native-commands

PowerShell 7.4 changed the behavior of the redirection operators when used to redirect the stdout stream of a native command. The redirection operators now preserve the byte-stream data when redirecting output from a native command. PowerShell doesn't interpret the redirected data or add any additional formatting.

Simply calling python -X utf8 will work:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

Same approach can be used to call Azure CLI:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -IBm azure.cli group show -n testrg > out.txt ; Get-Content out.txt
{
  ...
  "tags": {
    ...
    "key1": "测试測試"
  },
  ...
}
jiasli commented 6 months ago

Wait. As you are already using cp950 which is big5: ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5) according to https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers, I guess you are trying to parse characters not in cp950. May I know the original Chinese character that is causing problem?

doggy8088 commented 6 months ago

I'm okay with the cp950 in both Windows PowerShell or PowerShell 7+.

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console. That's why I need az.cmd to output UTF-8 by default.

jiasli commented 6 months ago

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console.

I fail to understand the relationship between Oh-My-Posh and encoding. Could you give more context on this? I don't think it is Oh-My-Posh that causes the encoding error. May I know the original Chinese character that is causing problem?

doggy8088 commented 6 months ago

It doesn't matter what original Chinese character are. All Chinese characters will be truncated from the output.

For your confusing, it because Oh-My-Posh can define special unicode font to display symbols on the prompt, like this:

image

So that my Console output encoding must be in UTF-8 encoding. Let's why I don't set cp950 on the Console.

jiasli commented 6 months ago

I don't think this got anything to do with Oh-My-Posh when redirection is involved. Without redirection, like a pure az account list, the output is indeed in UTF-8.

https://docs.python.org/3/library/sys.html#sys.stdout

On Windows, UTF-8 is used for the console device.

> python -c "import sys; print(sys.stdout.encoding)"
utf-8

In your original screenshot, Azure CLI is trying to encode its output with cp950, but certain characters can't be encoded by cp950 showing as "unsupported":

image

Besides Azure CLI, you can repro this issue with Python:

> python -c "print('测试')" > out.txt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jiasli\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

> python -c "import sys; print(sys.stdout.encoding)" > out.txt ; Get-Content out.txt
cp1252
doggy8088 commented 6 months ago

Here is my test:

image

amis92 commented 2 weeks ago

Don't you think this is issue is absurd in 2024? The windows region settings require administrative rights, so not everyone can change it, and the text says

... language (system locale) to use when displaying text in programs that do not support Unicode

emphasis mine

Is Python or AZ CLI a program that doesn't support Unicode? I don't think so. Is AZ CLI supposed to work with non-unicode tools? I don't suppose so, even the damn Notepad in Windows defaults to unicode these days. The Knack even specifically forces UTF8 on logfile output. Why not on stdout?

doggy8088 commented 2 weeks ago

I don't think they really understand our pain points. For decades.