Closed prcdpr closed 1 year ago
Is it because Encoding.UTF8.GetBytes(...)
produces a preamble on .NET Framework? Is it just the first few characters (i.e. the BOM) that are causing this issue?
I don't think so.
As I mentioned in report, byte representation is free of any extra bytes and looks valid.
The following example also doesn't work (.NET Framework 4.8):
using System.Text;
using CliWrap;
var pwshScript = "Write-Host 'hello'";
var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);
var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
.WithStandardInputPipe(PipeSource.FromBytes(bytes))
.WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
.WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
.WithValidation(CommandResultValidation.None)
.WithArguments(new[] { "-NoProfile", "-Command", "-" })
.ExecuteAsync();
What if you do FromStream(new MemoryStream(bytes))
?
Same error in .NET 4.8, works in .NET 7
I reduced the repro to this snippet, which indicates that the bug is in .NET Framework's implementation of Process
, rather than in CliWrap:
using var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
Arguments = "-NoProfile -Command -",
RedirectStandardInput = true,
RedirectStandardError = true,
RedirectStandardOutput = true,
UseShellExecute = false
}
};
process.Start();
using (process.StandardInput)
await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");
var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();
await process.WaitForExitAsync();
error.Should().BeEmpty();
output.TrimEnd().Should().Be("Hello world!");
The aforementioned Windows feature is in beta after all, so it might not be supported by .NET Framework (yet).
I don't know what to suggest in this case, apart from raising the issue with Microsoft and hope they fix this issue in .NET Framework.
Actually found a workaround - add
Console.InputEncoding = new UTF8Encoding(false);
before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.
Actually found a workaround - add
Console.InputEncoding = new UTF8Encoding(false);
before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.
It's not about BOM, as we've identified earlier, since you provided the raw bytes you decoded yourself. I also did FromBytes("Write-Host 'Hello World'"u8.ToArray())
in my tests and the result was the same.
CliWrap shouldn't change the value of Console.InputEncoding
as it's a global variable that has overreaching effects on other parts of the program.
It is about BOM but CliWrap makes developer think that PipeSource.FromBytes
will provide clean byte input while it isn't true (if UTF-8 Beta is enabled and Console.InputEncoding is default UTF8 with BOM).
I agree that CliWrap shouldn't change Console.InputEncoding
.
However I also believe that CliWrap shouldn't silently accept PipeSource.FromBytes
in this scenario.
If:
Console.InputEncoding
is set to default UTF-8 with BOMan exception with useful error message should be thrown because CliWrap cannot guarantee that stdin won't contain garbage bytes. Currently developer is led to believe that CliWrap's PipeSource.FromBytes
is encoding agnostic while it isn't true.
Yes, this is Microsoft bug, but a good error message encouraging developer to change Console.InputEncoding
will reduce a lot of headache and possibly further bug reports.
This sample doesn't work in .NET 4.8:
Console.InputEncoding = Encoding.UTF8;
using var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
Arguments = "-NoProfile -Command -",
RedirectStandardInput = true,
RedirectStandardError = true,
RedirectStandardOutput = true,
UseShellExecute = false,
}
};
process.Start();
using (process.StandardInput)
{
await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");
}
var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();
process.WaitForExit();
And after changing encoding to
Console.InputEncoding = new UTF8Encoding(false);
it starts working.
Also this works when InputEncoding is changed:
Console.InputEncoding = new UTF8Encoding(false);
var pwshScript = "Write-Host 'hello'";
var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);
var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
.WithStandardInputPipe(PipeSource.FromStream(new MemoryStream(bytes)))
.WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
.WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
.WithValidation(CommandResultValidation.None)
.WithArguments(new[] { "-NoProfile", "-Command", "-" })
.ExecuteAsync();
It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.
It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.
The issue is that we can't detect when the bug manifests on our side, because the extra bytes are added when they're written to the process's stdin. On CliWrap's side, the byte sequence is exactly what it's supposed to be (without any extra data).
If we just assume that if that Windows option is enabled, and the Console.InputEncoding
contains a preamble, we might easily start throwing false positives. For example, if (or really, when) Microsoft fixes that bug in .NET Framework, CliWrap will continue throwing exceptions until a new version is released. Note that "fixing the bug" does not necessarily entail changing Console.InputEncoding
but, for example, just stripping the BOM when it's written to the child process.
If there were a way to reliably identify that the byte sequence is corrupted by the time it gets to the child process, I'd be happy to add a corresponding exception. But I think the proposed heuristics are not deterministic enough.
Ultimately, you are using a beta (read: unstable) Windows feature on an old and largely unmaintained platform, which is .NET Framework. Combining cutting edge tech with legacy tech is bound to cause some issues eventually.
Version
3.6.4
Details
Consider this minimal repro (it runs on Windows with Powershell 7 installed):
This code will either work or not, depending whether it runs in .NET Framework or .NET Core.
Steps to reproduce
Code still works in .NET Core 7 but it breaks in .NET Framework with the error:
If you look closely, in the output above there is an invisible character so Powershell thinks the script is
Which causes it to fail. The problem occurs in combination of .NET Framework and UTF-8 support enabled. When looking at byte representation of the string which is provided to PipeSource, there are no extra bytes in the beginning of the array.