PowerShell / PSReadLine

A bash inspired readline implementation for PowerShell
BSD 2-Clause "Simplified" License
3.69k stars 293 forks source link

InvokePrompt breaks with a prompt that prints unicode characters #2866

Open segevfiner opened 2 years ago

segevfiner commented 2 years ago

Environment

PS version: 7.1.4
PSReadline version: 2.1.0
os: 10.0.19041.1 (WinBuild.160101.0800)
PS file version: 7.1.4.0
HostName: ConsoleHost (Windows Terminal)
BufferWidth: 120
BufferHeight: 30

Exception report

N/A

Steps to reproduce

  1. Install a prompt function that writes directly to the console and uses Unicode characters. (e.g. https://starship.rs/)
  2. Run [Microsoft.PowerShell.PSConsoleReadLine]::InvokePrompt()
  3. The prompt will be output garbled with question marks in place of Unicode characters.

This function is used by other projects, such as PSFzf to re-render the prompt in cases where this is necessary.

Expected behavior

The prompt is rendered correctly: image

Actual behavior

The prompt is rendered incorrectly: image

Analysis

It appears the InvokePrompt is using GetPrompt, buffering the prompt to a string and then writing it to the console by itself. But because the default [Console]::OutputEncoding is not UTF8, this breaks, which the prompt function handles when it gets to write to the console directly by itself under normal circumstances.

A workaround can be to set [Console]::OutputEncoding to [Text.Encoding]::UTF8, which does make this work, yet I'm unsure what side effects this might have on other stuff in PowerShell that will try to output to the console, or maybe that should have been the default but isn't for some reason?

If this shouldn't be changed, then maybe PSReadline should set this temporarily while printing the prompt to the console? Or alternatively, re-implement InvokePrompt in a way that won't require it to buffer the prompt string.

References

https://github.com/kelleyma49/PSFzf/issues/71

segevfiner commented 2 years ago

So digging in a bit, setting [Console]::OutputEncoding = [UnicodeEncoding]::new([BitConverter]::IsLittleEndian, $false) doesn't actually call SetConsoleOutputCP but just sets the console object to write to the console using WriteConsoleW:

dotnet/runtime/src/libraries/System.Console/src/System/ConsolePal.Windows.cs:121-128

So might actually be safe when supported by the .NET runtime, at least for native programs... But who knows about other stuff in .NET that might use it.

Alternatively we can just call WriteConsoleW directly (Probably changing the name and namespace for the add-type below, and/or combining them to a single class):

$sig = @'
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
public static extern bool WriteConsole(IntPtr hConsoleOutput, string lpBuffer,
   uint nNumberOfCharsToWrite, out uint lpNumberOfCharsWritten,
   IntPtr lpReserved);
'@

$WriteConsole = Add-Type -MemberDefinition $sig -Name "Win32WriteConsole" -Namespace Win32Functions -PassThru

$sig = @'
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
public static extern IntPtr GetStdHandle(int nStdHandle);
'@

$GetStdHandle = Add-Type -MemberDefinition $sig -Name "Win32GetStdHandle" -Namespace Win32Functions -PassThru

$a = prompt
$WriteConsole::WriteConsole($GetStdHandle::GetStdHandle(-11), $a, $a.Length, [ref]$null, [IntPtr]::Zero)  # Not sure [ref]$null is the right way to discard out parameters in PowerShell

Basically you can write UTF-16 to the console regardless of the console output CP, which only affects ANSI functions and std streams, using WriteConsoleW. (This is what Python's WindowsConsoleIO hack does too)

segevfiner commented 2 years ago

Powershell itself seems to do something similar: https://github.com/PowerShell/PowerShell/blob/2f57bf848b03828ee6c343b55f7ce80df2e5a23e/src/Microsoft.PowerShell.ConsoleHost/host/msh/ConsoleHost.cs#L2502 which ends up using https://github.com/PowerShell/PowerShell/blob/2f57bf848b03828ee6c343b55f7ce80df2e5a23e/src/Microsoft.PowerShell.ConsoleHost/host/msh/ConsoleTextWriter.cs#L12 as far as I can tell.

segevfiner commented 2 years ago

Well doing $prompt | Out-Host also works (Since it goes through the PowerShell console output machinery).

segevfiner commented 2 years ago

OK. So it looks like PSReadLine is already setting Console.OutputEncoding but it resets it before calling external commands. So what we need to do is simply set it again in InvokePrompt and just restore the previous value afterwards.

See https://github.com/kelleyma49/PSFzf/issues/71#issuecomment-961148891

daxian-dbw commented 2 years ago

@segevfiner, thanks for your follow-up investigation on the issue! If I understand it correctly, it's because the prompt function calls a native command, which returns a Unicode string, but [Console]::OutputEncoding is not UTF8, and that causes the returned string to become garbled. Is that understanding correct?

Can you please share your prompt function? A simple prompt function that can reproduce the problem would be very helpful, thanks!

segevfiner commented 2 years ago

@segevfiner, thanks for your follow-up investigation on the issue! If I understand it correctly, it's because the prompt function calls a native command, which returns a Unicode string, but [Console]::OutputEncoding is not UTF8, and that causes the returned string to become garbled. Is that understanding correct?

Can you please share your prompt function? A simple prompt function that can reproduce the problem would be very helpful, thanks!

Yes. It's because PSReadline is using the Console object for output and changing [Console]::OutputEncoding when it runs, but it doesn't do so when one of its functions is called from the outside. Powershell itself has its own console output machinery that bypasses this. As in, it doesn't seem to be using the Console object.

I'm using [starship[(https://github.com/starship/starship) and had it triggered by PSFzf, but for a simple one, just take the default prompt and stick non BMP Unicode in it:

function prompt { "PS $($executionContext.SessionState.Path.CurrentLocation)$('❯' * ($nestedPromptLevel + 1)) " }

Also note https://github.com/kelleyma49/PSFzf/issues/71#issuecomment-961148891 where I posted a workaround that PSFzf is going to incorporate for this.

Jaykul commented 2 years ago

Setting the outputEncoding to UTF8 (with no BOM) seems to resolve this:

[Console]::OutputEncoding = $OutputEncoding = [Console]::InputEncoding = [System.Text.UTF8Encoding]::new()

To me, it feels crazy how it sometimes works right and sometimes I get ? for every extended character:

WindowsTerminal_2022-04-30_23-50-47

As people have said above, if you don't have the console encoding set to UTF8, then when PSReadLine attempts to copy your prompt and change the color to show a parse error (e.g. in RenderErrorPrompt) it doesn't get a proper copy.

You can work around this yourself:

  1. Call Set-PSReadLineOption and explicitly set -PromptText to an array of two strings (normal, and error) so that PSReadLine can stop guessing what text to use.
  2. As I said above, change your console encoding to UTF8. We all should have done this decades ago.

A better workaround is probably for PowerShell or PSReadLine to set the console encoding to UTF-8, either all the time, or explicitly when trying to read from it.

But there's an easy fix:

PSReadLine could just stop trying to read the prompt from the screen. If the user configures PromptText, then use that. Otherwise, do nothing to the prompt, since you can't be sure you're not going to break it, and this feature isn't worth the risk.

dcnieho commented 1 year ago

Here another instance of this bug messing with a prompt:

https://user-images.githubusercontent.com/1787673/210333851-f1055e78-1ec1-4636-a925-3cbd3c0a9f97.mp4

(from https://github.com/JanDeDobbeleer/oh-my-posh/issues/3298)

StevenBucher98 commented 1 year ago

This function was not tested for wide public use and is used specifically for some PSReadLine functionality so likely many bugs with it, marking as a bug.