Tyrrrz / CliWrap

Library for running command-line processes
MIT License
4.32k stars 264 forks source link

When using .NET Framework instead of .NET Core, there are problems with stdin encoding #215

Closed prcdpr closed 1 year ago

prcdpr commented 1 year ago

Version

3.6.4

Details

Consider this minimal repro (it runs on Windows with Powershell 7 installed):

using System.Text;
using CliWrap;

var pwshScript = "Write-Host 'hello'";

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromBytes(Encoding.UTF8.GetBytes(pwshScript)))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

This code will either work or not, depending whether it runs in .NET Framework or .NET Core.

Steps to reproduce

stdout> hello

Code still works in .NET Core 7 but it breaks in .NET Framework with the error:

stderr> Write-Host: The term 'Write-Host' is not recognized as a name of a cmdlet, function, script file, or executable program.
stderr> Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

If you look closely, in the output above there is an invisible character so Powershell thinks the script is

\ufeffWrite-Host 'hello'

Which causes it to fail. The problem occurs in combination of .NET Framework and UTF-8 support enabled. When looking at byte representation of the string which is provided to PipeSource, there are no extra bytes in the beginning of the array.

Tyrrrz commented 1 year ago

Is it because Encoding.UTF8.GetBytes(...) produces a preamble on .NET Framework? Is it just the first few characters (i.e. the BOM) that are causing this issue?

prcdpr commented 1 year ago

I don't think so.

As I mentioned in report, byte representation is free of any extra bytes and looks valid.

The following example also doesn't work (.NET Framework 4.8):

using System.Text;
using CliWrap;

var pwshScript = "Write-Host 'hello'";

var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromBytes(bytes))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

image

Tyrrrz commented 1 year ago

What if you do FromStream(new MemoryStream(bytes))?

prcdpr commented 1 year ago

Same error in .NET 4.8, works in .NET 7 image

Tyrrrz commented 1 year ago

I reduced the repro to this snippet, which indicates that the bug is in .NET Framework's implementation of Process, rather than in CliWrap:

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
        Arguments = "-NoProfile -Command -",
        RedirectStandardInput = true,
        RedirectStandardError = true,
        RedirectStandardOutput = true,
        UseShellExecute = false
    }
};

process.Start();

using (process.StandardInput)
    await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");

var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();

await process.WaitForExitAsync();

error.Should().BeEmpty();
output.TrimEnd().Should().Be("Hello world!");

The aforementioned Windows feature is in beta after all, so it might not be supported by .NET Framework (yet).

I don't know what to suggest in this case, apart from raising the issue with Microsoft and hope they fix this issue in .NET Framework.

prcdpr commented 1 year ago

Actually found a workaround - add

Console.InputEncoding = new UTF8Encoding(false);

before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.

Tyrrrz commented 1 year ago

Actually found a workaround - add

Console.InputEncoding = new UTF8Encoding(false);

before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.

It's not about BOM, as we've identified earlier, since you provided the raw bytes you decoded yourself. I also did FromBytes("Write-Host 'Hello World'"u8.ToArray()) in my tests and the result was the same.

CliWrap shouldn't change the value of Console.InputEncoding as it's a global variable that has overreaching effects on other parts of the program.

prcdpr commented 1 year ago

It is about BOM but CliWrap makes developer think that PipeSource.FromBytes will provide clean byte input while it isn't true (if UTF-8 Beta is enabled and Console.InputEncoding is default UTF8 with BOM).

I agree that CliWrap shouldn't change Console.InputEncoding. However I also believe that CliWrap shouldn't silently accept PipeSource.FromBytes in this scenario.

If:

an exception with useful error message should be thrown because CliWrap cannot guarantee that stdin won't contain garbage bytes. Currently developer is led to believe that CliWrap's PipeSource.FromBytes is encoding agnostic while it isn't true. Yes, this is Microsoft bug, but a good error message encouraging developer to change Console.InputEncoding will reduce a lot of headache and possibly further bug reports.

This sample doesn't work in .NET 4.8:

Console.InputEncoding = Encoding.UTF8;

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
        Arguments = "-NoProfile -Command -",
        RedirectStandardInput = true,
        RedirectStandardError = true,
        RedirectStandardOutput = true,
        UseShellExecute = false,
    }
};

process.Start();

using (process.StandardInput)
{
    await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");
}

var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();

process.WaitForExit();

And after changing encoding to

Console.InputEncoding = new UTF8Encoding(false);

it starts working.

prcdpr commented 1 year ago

Also this works when InputEncoding is changed:

Console.InputEncoding = new UTF8Encoding(false);

var pwshScript = "Write-Host 'hello'";

var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromStream(new MemoryStream(bytes)))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.

Tyrrrz commented 1 year ago

It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.

The issue is that we can't detect when the bug manifests on our side, because the extra bytes are added when they're written to the process's stdin. On CliWrap's side, the byte sequence is exactly what it's supposed to be (without any extra data).

If we just assume that if that Windows option is enabled, and the Console.InputEncoding contains a preamble, we might easily start throwing false positives. For example, if (or really, when) Microsoft fixes that bug in .NET Framework, CliWrap will continue throwing exceptions until a new version is released. Note that "fixing the bug" does not necessarily entail changing Console.InputEncoding but, for example, just stripping the BOM when it's written to the child process.

If there were a way to reliably identify that the byte sequence is corrupted by the time it gets to the child process, I'd be happy to add a corresponding exception. But I think the proposed heuristics are not deterministic enough.

Ultimately, you are using a beta (read: unstable) Windows feature on an old and largely unmaintained platform, which is .NET Framework. Combining cutting edge tech with legacy tech is bound to cause some issues eventually.