JanDeDobbeleer / oh-my-posh

The most customisable and low-latency cross platform/shell prompt renderer
https://ohmyposh.dev
MIT License
16.7k stars 2.34k forks source link

Path encoding problem on chinese charactor. #576

Closed sheey11 closed 3 years ago

sheey11 commented 3 years ago

Prerequisites

Description

Path encoding problem on Set-PoshPrompt.

&: The term 'C:\Users\sheey\OneDrive\鏂囨。\PowerShell\Modules\oh-my-posh\3.120.1\bin\posh-windows-amd64.exe'
is not recognized as a name of a cmdlet, function, script file, or executable program.

The string 鏂囨。 should be 文档, or Document in English.

Environment

Steps to Reproduce

Set-PoshPrompt -Theme agnosterplus

Expected behavior: Set theme.

Actual behavior: Got error.


I changed oh-my-posh.psm1 to see if it's the pwsh has wrong encoding.

# near line 59
$poshCommand = Get-PoshCommand
Write-Output $poshCommand # I added this to print the path
Invoke-Expression (& $poshCommand --init --shell=pwsh --config="$config")

but it displayed characters correctly.

JanDeDobbeleer commented 3 years ago

We set the shell to UTF-8 already so not sure what we can do to fix that...

sheey11 commented 3 years ago

I solved it by setting Windows encoding to UTF8. See stackoverflow.

kthy commented 3 years ago

I had the same problem with Set-PoshPrompt on a Danish laptop with Ø in the posh path. Solved by changing to beta UTF-8 mode as per the SO link from @Sheey11 .

heya5 commented 3 years ago

I have same problem and @Sheey11 's method also works for me.

tch1121 commented 2 years ago

7z l file View file Chinese display garbled

Normal after disabling Set-PoshPrompt -Theme xxxxx

JanDeDobbeleer commented 2 years ago

@tch1121 we can't solve that. This is a bigger Windows topic and no history of UTF8. You can force the shell to another code page, but that will break other things and void your warranty 😃

gerardog commented 2 years ago

Hi Jan. I had some encoding issues with pwsh which lead me to force utf-8 on gerardog/gsudo. I found out that:

I configured a virtual machine and installed GBK as default language and now I have seen what the problem is.

I learned that changing the "current process" encoding inside one app (For example Console.InputEncoding = System.Text.Encoding.UTF8), also changes the console-host code page, and that change stays even after the app the closed.

Also the whole console window changes its font and language the moment the codepage is changed to 65001 (utf-8). CMD language changes, and what encoding is assumed for files without BOM...

So, I decided that gsudo should not change the codepage. If an encoding problem occurs, that should be addressed by each user... English users are safe to change to utf-8 without consequences, but others don't.

Ultimately found out it was not pwsh but oh-my-posh who was changing the encoding. I've captured oh-my-posh --print-init and put that literal into my $profile without the encoding. I saw oh-my-posh fully functional when using a proper font. I understand there must be other issues if you don't change the encoding. But at least my take on it is that it should be the user who changes the encoding knowingly and probably do it in a full user scope, given that if one process changes the encoding, it also changes the code page for the current console, even if you close PowerShell.

In that direction, what about adding a setting to disable any encoding change ? Something the user could put in their profiles before oh-my-posh, like...

 if ($ExecutionContext.SessionState.LanguageMode -ne "ConstrainedLanguage" -and -not $OhMyPoshSkipEncodingChange) {
     [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
 }

Thanks, and great work with oh-my-posh. I Love it and use it everyday.

JanDeDobbeleer commented 2 years ago

@gerardog the issue is that when this isn't used, it really ruins the font rendering on Windows. It's really hard to explain to users they are in driver's seat for this when in reality it's PowerShell that should default to UTF8 anyways looking at other shells out there. There's an issue out in the open for a long time already, requesting to tackle this. In the end if we do it, or pwsh does it natively, it still brings the same exact challenge to every tool out there. Default to UTF-8 :-)

gerardog commented 2 years ago

@gerardog the issue is that when this isn't used, it really ruins the font rendering on Windows.

I run Oh-My-Posh daily without setting UTF8 and it works flawlessly, if a proper font is installed.

image

Even without WT: image

when in reality it's PowerShell that should default to UTF8 anyways

I think the reason why they haven't already done it (after so many months), is because is not so simple: it breaks many cultures workflows... For some reason it changes the codepage, which for Asian languages/DBCS triggers a font change instantly (at least on ConHost). Also affects how apps interpret files without BOM. If Pwsh defaults to UTF8, it will break those cultures, Imagine your CMD breaks completely the moment you just call 'pwsh -c Echo 123' once.

image

In the end if we do it, or pwsh does it natively, it still brings the same exact challenge to every tool out there

I don't believe Pwsh team will ship it. IMO this is not a 'shell' problem, it's much broader: maybe addressable at the terminal console level, but I believe it should be a Windows Update thing...

Maybe an older version of Oh-My-Posh did require UTF-8, and nowadays it's not needed at all. I was looking around your issues to see what did UTF8 fixed, and I see a ratio like 9 to 1: more problems generated by changing the encoding than fixes.

I will write down this opinion in the pwsh issue you linked and see how that goes. Nonetheless please consider a way to opt-out of forcing UTF8 encoding, or even better (IMHO) instead just let users opt-in if needed.

JanDeDobbeleer commented 2 years ago

@gerardog allow me to do some tests, it's been a while and conhost also went through some changes. If it works on my machines, I'll do as proposed because I agree, we shouldn't have to do this.

JanDeDobbeleer commented 2 years ago

@gerardog well, not very successful it seems...

image
gerardog commented 2 years ago

Oh! Sorry Jan, forgot I actually left this line behind in my profile: [console]::OutputEncoding = New-Object System.Text.UTF8Encoding. Somehow that one does not changes the codepage (On my notebook where default codepage is 437).

Nonetheless... I see the issue. GO outputs UTF-8. We can invoke OMP and expect UTF-8 without changing the console encoding/codepage:

function Start-Utf8Process 
{
     param(
        [string] $FileName,
        [string] $Arguments
    )

    $Process = New-Object System.Diagnostics.Process
    $StartInfo = $Process.StartInfo
    $StartInfo.StandardErrorEncoding = $StartInfo.StandardInputEncoding = $StartInfo.StandardOutputEncoding = [System.Text.Encoding]::UTF8
    $StartInfo.RedirectStandardError = $StartInfo.RedirectStandardInput = $StartInfo.RedirectStandardOutput = $true
    $StartInfo.FileName = $filename 
    $StartInfo.Arguments = $Arguments
    $_ = $Process.Start();
    $_ = $Process.WaitForExit();
    return $Process.StandardOutput.ReadToEnd() + $Process.StandardError.ReadToEnd() 
}

Then in $prompt function, replace

    $standardOut = @(&$omp --error="$errorCode" --pwd="$cleanPWD" --pswd="$cleanPSWD" --execution-time="$executionTime" --stack-count="$stackCount" --config="$config" --terminal-width=$terminalWidth 2>&1)

with:

    $standardOut = @(Start-Utf8Process $omp "--error=""$errorCode"" --pwd=""$cleanPWD"" --pswd=""$cleanPSWD"" --execution-time=""$executionTime"" --stack-count=""$stackCount"" --config=""$config"" --terminal-width=$terminalWidth 2>&1")

(there are a few other lines that need to be replaced in a similar fashion)

The result, it works:

image
JanDeDobbeleer commented 2 years ago

@gerardog oh my. Thank you so much for this!

JanDeDobbeleer commented 2 years ago

I need to validate some minor things, but it seems to work. Wasn't as straightforward as the transient prompt logic ruined the continuation prompt 🤦🏻‍♂️

gerardog commented 2 years ago

PSReadLine is at fault now... Can't believe it's changing the encoding back and forth.

Wonder if the following temporal codepage is a little better (as seen in this comment): [Console]::OutputEncoding = [System.Text.UnicodeEncoding]::new([!BitConverter]::IsLittleEndian, $false)

(that one avoids CodePage changes, as opposed to UTF8)

image

Nonetheless will test it on the my chinese VM.

JanDeDobbeleer commented 2 years ago

@gerardog I found a fix and it's live so I wouldn't worry (unless you found an issue with that change).

gerardog commented 2 years ago

Thanks @JanDeDobbeleer!

gerardog commented 2 years ago

It passed a quick test on a fresh Chinese VM.

image
github-actions[bot] commented 7 months ago

This issue has been automatically locked since there has not been any recent activity (i.e. last half year) after it was closed. It helps our maintainers focus on the active issues. If you have found a problem that seems similar, please open a discussion first, complete the body with all the details necessary to reproduce, and mention this issue as reference.