PowerShell / vscode-powershell

Provides PowerShell language and debugging support for Visual Studio Code
https://marketplace.visualstudio.com/items/ms-vscode.PowerShell
MIT License
1.71k stars 491 forks source link

Built-in `help` function writes Unicode BOM () to host #1104

Open brantb opened 7 years ago

brantb commented 7 years ago

System Details

Issue Description

When I use the help function, a few garbage unicode characters are written to the output stream. Using Get-Help instead of help works as expected. This only happens in the extension-provided Powershell Integrated Console, not the default vscode console.

C:\> help get-date

NAME
    Get-Date
[...snip...]
C:\> help get-date -showwindow

C:\> help get-date > $null
(no output)
C:\> get-help get-date -showwindow
(no output)

On my system, the definition of the help function is:

C:\> Get-Command help | select -ExpandProperty Definition
<#
.FORWARDHELPTARGETNAME Get-Help
.FORWARDHELPCATEGORY Cmdlet 
#>
[CmdletBinding(DefaultParameterSetName='AllUsersView', HelpUri='https://go.microsoft.com/fwlink/?LinkID=113316')]
param(
    [Parameter(Position=0, ValueFromPipelineByPropertyName=$true)]
    [string]
    ${Name},

    [string]
    ${Path},

    [ValidateSet('Alias','Cmdlet','Provider','General','FAQ','Glossary','HelpFile','ScriptCommand','Function','Filter','ExternalScript','All','DefaultHelp','Workflow','DscResource','Class','Configuration')]
    [string[]]
    ${Category},

    [string[]]
    ${Component},

    [string[]]
    ${Functionality},

    [string[]]
    ${Role},

    [Parameter(ParameterSetName='DetailedView', Mandatory=$true)]
    [switch]
    ${Detailed},

    [Parameter(ParameterSetName='AllUsersView')]
    [switch]
    ${Full},

    [Parameter(ParameterSetName='Examples', Mandatory=$true)]
    [switch]
    ${Examples},

    [Parameter(ParameterSetName='Parameters', Mandatory=$true)]
    [string]
    ${Parameter},

    [Parameter(ParameterSetName='Online', Mandatory=$true)]
    [switch]
    ${Online},

    [Parameter(ParameterSetName='ShowWindow', Mandatory=$true)]
    [switch]
    ${ShowWindow})

    #Set the outputencoding to Console::OutputEncoding. More.com doesn't work well with Unicode.
    $outputEncoding=[System.Console]::OutputEncoding

    Get-Help @PSBoundParameters | more

... and more is defined as ...

C:\> Get-Command more | select -ExpandProperty Definition

param([string[]]$paths)
$OutputEncoding = [System.Console]::OutputEncoding
if($paths) {
    foreach ($file in $paths)
    {
        Get-Content $file | more.com
    }
} else { $input | more.com }

Attached Logs

1510862054-d0c22b7a-9fe8-4400-9e12-9cb2d4fd6b5a1510862041231.zip

brantb commented 7 years ago

Google suggests it's a byte order mark.

SteveL-MSFT commented 7 years ago

help is only showing what is in the help text. This issue should be opened here: https://github.com/powershell/powershell-docs to use utf8-noBOM

brantb commented 7 years ago

I'm just clarifying, but is this truly an issue with the help text if this issue only surfaces in vscode-powershell's Visual Studio Code Host and not in any other host like ConsoleHost?

rkeithhill commented 7 years ago

One of the changes that was made to PSIC (presumably to fix another issue) was that the output encoding was changed to UTF8 from the default. That might explain why it behaves differently than the regular PowerShell terminal.

SteveL-MSFT commented 7 years ago

@brantb you'll see the same behavior on Linux/macOS with PowerShell Core 6 on the console. Windows understands the BOM and doesn't show it.

Halkcyon commented 6 years ago

Confirming I still see this only on the PowerShell Integrated Console.

SydneyhSmith commented 5 years ago

Closing as resolved as we have now documented how to configure encoding for PowerShell in Vscode: https://docs.microsoft.com/en-us/powershell/scripting/components/vscode/understanding-file-encoding?view=powershell-6

dsolodow commented 5 years ago

I looked at that doc, and I may be missing something but it doesn't seem to prevent the integrated console from displaying the BOM character?

SydneyhSmith commented 5 years ago

@dsolodow I reviewed the issue and you are correct, I am re-opening this!

We noticed that this issue only occurs with Help and not with Get-Help the difference being that Help initially displays a smaller result with --More-- which comes from more.com so this may be what is causing the encoding issue.

rjmholt commented 5 years ago

For reference the the UTF-8 BOM is 0xEF 0xBB 0xBF. When interpreted with code page 437 (AKA DOS Latin US) it resolves as the ASCII bow drawing characters .

My current suspicion is that the integrated console is resolving more.com for help, which can't understand UTF-8.

SteveL-MSFT commented 5 years ago

@TylerLeonhardt, @rjmholt , and I looked at this. It appears to be a combination of the extension setting [console]::OutputEncoding to UTF8 (w/ BOM) and use of 437 code page. This results in a BOM being written and a codepage that renders it. I believe @TylerLeonhardt is working on a proposed fix.

TylerLeonhardt commented 5 years ago

... possibly. I need to speak to the vscode folks to see if they have any ideas.

My thinking is that [System.Console]::OutputEncoding is somehow related to the chcp output...

In pwsh.exe, [System.Console]::OutputEncoding is set to Code Page 437 (on Windows). In the extension, we overwrite this:

[console]::OutputEncoding = [Encoding]::UTF8

Which is why we're seeing the BOM in the PowerShell Integrated Console...

However, if we don't do that... then the PowerShell Integrated Console can no longer render non-ascii characters like Chinese characters and the like.

That's why we originally overwrote the [Console]::OutputEncoding... but that was probably not the right approach. There should be a way to not see the BOM but also see non-ASCII characters... just like what the non-Integrated Console shows.

TylerLeonhardt commented 5 years ago

I'll quote @rjmholt on this... "Encoding is a tar pit" 😅

SteveL-MSFT commented 5 years ago

But you can set OutputEncoding to UTF8 NoBOM.

Halkcyon commented 5 years ago

@TylerLeonhardt I ran into a similar issue surrounding more.com and the like recently. The encoding issue I ran into was not resolved until I fixed both the codepage using chcp.com and $OutputEncoding.

I could not replicate it on the latest Win10 build, however, just Win7.

B-Art commented 3 months ago

@dsolodow I reviewed the issue and you are correct, I am re-opening this!

We noticed that this issue only occurs with Help and not with Get-Help the difference being that Help initially displays a smaller result with --More-- which comes from more.com so this may be what is causing the encoding issue.

I can confirm that this is still the same. All beit that it shows:

help Test-Date -Examples
´╗┐
NAME
    Test-Date

ALIASES
    None

REMARKS
    None
get-help Test-Date -Examples

NAME
    Test-Date

ALIASES
    None

REMARKS
    None

To add some extra:

chcp.com
Active code page: 850
[System.Console]::OutputEncoding

Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

Than do the following:

chcp.com 437

And ´╗┐ will change into ∩╗┐

A workarround for me:

Set-Alias -Name help -Value Get-Help

Than the original help from DOS will not come into play.