Tyrrrz / CliWrap

Library for running command-line processes
MIT License
4.32k stars 264 forks source link

Stdout and stderr order not preserved #202

Closed aetonsi closed 1 year ago

aetonsi commented 1 year ago

Version

3.6.0

Details

I was trying to output the entire invocation's output (so both stdout and stderr) to a file, but the order of output and error data is always messed up. AFAIK, maintaining the output's correct order is a common problem with command line programs, due to output buffering or something...

Steps to reproduce

As test, i used a simple php invocation (all the following scripts are run in powershell):

PS> php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
# and the output is:
o111  E222  E333  o444  o555

But if i try to invoke that in a C# app via Cliwrap, everything i try messes up the output in some way. I tried outputting to the console output stream, to a StringBuilder, to a file. Nothing works correctly. Here's a simple c# test program:

using CliWrap;
using System.Text;

string program = "php";
string[] program_args = new[] { "-r", "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; " };

string fp = @"c:\tmp\buff.txt";
FileStream fs = System.IO.File.Open(fp, FileMode.Create);
StringBuilder sb = new StringBuilder();
PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
});

Command cmd = Cli.Wrap(program)
    .WithArguments(program_args, true)
    .WithValidation(CommandResultValidation.None) | (t, t);

Console.Write(","); // to create CSV structure (it concatenates to the native php call output)
await (cmd.ExecuteAsync()); // output: console standard output stream
Console.Write("," + sb.ToString()); // output: string builder
fs.Close();
Console.Write("," + System.IO.File.OpenText(fp).ReadToEnd()); // output: file contents

And here's a powershell script to compare the correct native output, to Cliwrap's tries:

# it loops 10 times
1..10 | ForEach-Object {
    &  php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
    & .\myprogram
    ''
}
The output is practically (willingly) in CSV format, so here's the result converted to a markdown table: direct php call console string builder file
o111 E222 E333 o444 E222 o111 E333 o444 o111 o444 E222 E333 o111 E222 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 o111 E222 E333 o444 o111 E222 o444 E333
o111 E222 E333 o444 E222 o111 E333 o444 o111 o444 E222 E333 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 E222 E333 o111 o444 E222 o111 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 E222 E333 o111 o444 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 o111 E222 o444 E333 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 E222 o111 E33 o444 E222 o111 E333 o444
o111 E222 E333 o444 o111 E222 o444 E333 o111 o444 E222 E333 o111 E222 o444 E333
o111 E222 E333 o444 E222 E333 o111 o444 o111 E222 E333 o444 E222 o111 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 o111 E222 E33 o444 E222 o111 E333 o444

As you can see, when directly invoking php, the output is always in correct order. When piped via Cliwrap, it's always randomly out of order, and sometimes it's even partially truncated (E33 instead of E333 on a couple of lines).

Is there any way to fix this ordering problem, or any way to pipe/output everything together and have it in the correct order?


thank you have a nice day

Tyrrrz commented 1 year ago

If you run your php script in a command shell and pipe the stdout/stderr to separate files, does it maintain order?

aetonsi commented 1 year ago

yeah you can try it yourself if you want, i simply used php as a simple means of having irregularly alternating output and error data. You can create a simple php script with totally random alternating stdout/stderr like this (pwsh):

'<?php ' > .\script.php
1..10 | ForEach-Object {
    $stream = @('STDOUT', 'STDERR') | Get-Random
    $text = "$("$stream".Substring(3))$_"
    "fwrite($stream, '$text ');" >>script.php
}

That generates a script.php like this:

<?php 
fwrite(STDOUT, 'OUT1 ');
fwrite(STDERR, 'ERR2 ');
fwrite(STDERR, 'ERR3 ');
fwrite(STDERR, 'ERR4 ');
fwrite(STDOUT, 'OUT5 ');
fwrite(STDERR, 'ERR6 ');
fwrite(STDOUT, 'OUT7 ');
fwrite(STDOUT, 'OUT8 ');
fwrite(STDERR, 'ERR9 ');
fwrite(STDOUT, 'OUT10 ');

And if you invoke the same script 10 times the output is always the same (note the >>stdout.log 2>>stderr.log splitting the output):

1..10 | ForEach-Object {
    & php -f script.php >>stdout.log 2>>stderr.log
}

it generates the following files:

stdout.log ```log OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 ```
stderr.log ``` ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 ```

Problems arise only when using Cliwrap in c#. I am using the | ( t, t ) to pipe to the same FileStream object, shouldn't this work? Or is it wrong?

Tyrrrz commented 1 year ago

I am using the | ( t, t ) to pipe to the same FileStream object, shouldn't this work? Or is it wrong?

That is correct. However, the stream implementation itself may not be thread-safe. You can try using Stream.Synchronized(...) although I think that only affects sync methods and probably won't help much in this case anyway.

aetonsi commented 1 year ago

yes as you guessed using a synchronized stream doesn't make a difference:

Stream ss = Stream.Synchronized(fs);
PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToStream(ss)
});

same result.

The stringbuilder is even worse, it loses pieces of output down the road:

test code ```c# using CliWrap; using System.Text; string program = "php"; string[] program_args = new[] { "-f", "script.php" }; string fp = @".\cliwrap.log"; StringBuilder sb = new StringBuilder(); PipeTarget t = PipeTarget.Merge(new[] { PipeTarget.ToStringBuilder(sb) }); await ( Cli.Wrap(program) .WithArguments(program_args, true) .WithValidation(CommandResultValidation.None) | (t, t) ).ExecuteAsync(); File.AppendAllText(fp, sb.ToString()); File.AppendAllText(fp, Environment.NewLine); ```
stringbuilder result ```log ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 ERR2 OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 OUT1 ERR2 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 OUT1 ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 ```

The interesting thing is, PipeTarget.ToFile(...) has the correct order and data is not broken up. It is simply missing the entire STDERR data:

code ```c# using CliWrap; string program = "php"; string[] program_args = new[] { "-f", "script.php" }; string fpr = @".\cliwrap.log" + (new Random().Next(1, 10000)); PipeTarget t = PipeTarget.Merge(new[] { PipeTarget.ToFile(fpr), }); await ( Cli.Wrap(program) .WithArguments(program_args, true) .WithValidation(CommandResultValidation.None) | (t, t) ).ExecuteAsync(); ```
log (reading all "cliwrap.log*" that the script generates; 1 file content per line:) to read all files: `get-content -Path cliwrap.log\*` ```log OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 OUT1 OUT5 OUT7 OUT8 OUT10 ```
Tyrrrz commented 1 year ago

Okay, so your last comment suggests more to me that it might be a thread-safety issue. Try this:

using var semaphore = new SemaphoreSlim(1, 1);
var stringBuilder = new StringBuilder();

var target = PipeTarget.ToDelegate(async line =>
{
    await semaphore.WaitAsync(cancellationToken);
    try
    {
        stringBuilder.AppendLine(line);
    }
    finally
    {
        semaphore.Release();
    }
});

var cmd = Cli.Wrap(...) | (target, target);

Haven't tested the above code myself so please review it.

Tyrrrz commented 1 year ago

I came back to investigate a bit more and reproduced the original scenario with the following cmd script:

@echo off
setlocal EnableDelayedExpansion

for /L %%i in (1,1,100) do (
  set /A "remainder=%%i %% 2"
  if !remainder! == 0 (
    echo ERR %%i 1>&2
  ) else (
    echo OUT %%i
  )
)

Running the following maintains order:

$ test.bat > out.txt 2>&1

The following CliWrap code does not maintain order:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

var target = PipeTarget.Merge(
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
);

await (Cli.Wrap("test.bat") | (target, target))
    .ExecuteAsync();
Output ``` OUT 1 ERR 2 OUT 3 ERR 4 OUT 5 ERR 6 OUT 7 ERR 8 OUT 9 ERR 10 ERR 12 OUT 11 ERR 14 OUT 13 OUT 15 ERR 16 OUT 17 ERR 18 ERR 20 OUT 19 ERR 22 OUT 21 ERR 24 OUT 23 OUT 25 OUT 27 ERR 26 OUT 29 ERR 28 OUT 31 OUT 33 OUT 35 ERR 30 ERR 32 ERR 34 OUT 37 ERR 36 ERR 38 OUT 39 OUT 41 ERR 40 ERR 42 OUT 43 ERR 44 OUT 45 ERR 46 ERR 48 OUT 47 OUT 49 ERR 50 OUT 51 OUT 53 OUT 55 OUT 57 OUT 59 ERR 52 ERR 54 OUT 61 ERR 56 OUT 63 OUT 65 ERR 58 OUT 67 ERR 60 OUT 69 OUT 71 ERR 62 ERR 64 OUT 73 OUT 75 ERR 66 OUT 77 OUT 79 ERR 68 ERR 70 OUT 81 ERR 72 OUT 83 OUT 85 ERR 74 OUT 87 ERR 76 OUT 89 ERR 78 ERR 80 OUT 91 ERR 82 OUT 93 ERR 84 OUT 95 ERR 86 OUT 97 ERR 88 ERR 90 ERR 92 OUT 99 ERR 94 ERR 96 ERR 98 ERR 100 ```

The following CliWrap code does maintain order:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

using var semaphore = new SemaphoreSlim(1, 1);
var target = PipeTarget.ToDelegate(async line =>
{
    await semaphore.WaitAsync();
    try
    {
        Console.WriteLine(line);
        sb.AppendLine(line);
        await fs.WriteAsync(Encoding.UTF8.GetBytes(line));
    }
    finally
    {
        semaphore.Release();
    }
});

await (Cli.Wrap("test.bat") | (target, target))
    .ExecuteAsync();
Output ``` OUT 1 ERR 2 OUT 3 ERR 4 OUT 5 ERR 6 OUT 7 ERR 8 OUT 9 ERR 10 OUT 11 ERR 12 OUT 13 ERR 14 OUT 15 ERR 16 OUT 17 ERR 18 OUT 19 ERR 20 OUT 21 ERR 22 OUT 23 ERR 24 OUT 25 ERR 26 OUT 27 ERR 28 OUT 29 ERR 30 OUT 31 ERR 32 OUT 33 ERR 34 OUT 35 ERR 36 OUT 37 ERR 38 OUT 39 ERR 40 OUT 41 ERR 42 OUT 43 ERR 44 OUT 45 ERR 46 OUT 47 ERR 48 OUT 49 ERR 50 OUT 51 ERR 52 OUT 53 ERR 54 OUT 55 ERR 56 OUT 57 ERR 58 OUT 59 ERR 60 OUT 61 ERR 62 OUT 63 ERR 64 OUT 65 ERR 66 OUT 67 ERR 68 OUT 69 ERR 70 OUT 71 ERR 72 OUT 73 ERR 74 OUT 75 ERR 76 OUT 77 ERR 78 OUT 79 ERR 80 OUT 81 ERR 82 OUT 83 ERR 84 OUT 85 ERR 86 OUT 87 ERR 88 OUT 89 ERR 90 OUT 91 ERR 92 OUT 93 ERR 94 OUT 95 ERR 96 OUT 97 ERR 98 OUT 99 ERR 100 ```

I'm not sure how cmd or PowerShell do it on their end, but I assume they have a way to differentiate between discrete write operations, which allows them to synchronize them even if the outputs are not separated by line breaks (or any other predefined character sequence).

On CliWrap's side, stdout and stderr are just homogenous binary streams, so it's impossible (to my knowldge) to reliably identify individual writes. The ToDelegate(...) approach above identifies them by line breaks.

For completeness’s sake, I tested it out with Process too and it also doesn't ensure the order of writes:

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = "test.bat",
        RedirectStandardOutput = true,
        RedirectStandardError = true
    },
    EnableRaisingEvents = true
};

process.OutputDataReceived += (sender, args) =>
{
    if (args.Data is not null)
    {
        Console.WriteLine(args.Data);
    }
};

process.ErrorDataReceived += (sender, args) =>
{
    if (args.Data is not null)
    {
        Console.WriteLine(args.Data);
    }
};

process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();

process.WaitForExit();
Tyrrrz commented 1 year ago

Unfortunately, it seems that with a larger sequence of writes, even my second method doesn't work. I think it may be impossible to achieve this directly.

However, one trick you can do is wrap the shell and merge stdout and stderr:

Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"})

Then you can process the output normally in CliWrap, since it's now just one stream:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

var target = PipeTarget.Merge(
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
);

await (Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"}) | target)
    .ExecuteAsync();
aetonsi commented 1 year ago

Sorry i didn't reply for weeks, i literally had the tab open but too many things to do.. You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion? I've always had problems myself with this thing in c# and always assumed i was doing something wrong since i'm not a c# dev. But if it's totally impossible it might be useful to report it to someone (to whom, i don't know, it depends)

tyfyt

Tyrrrz commented 1 year ago

You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion?

It's definitely impossible using .NET's Process class (which CliWrap is based on), from my understanding. If you make your own native calls to create and run a process, then it should be theoretically possible on Windows, I believe. I don't know about other operating systems, though.

This is not specific to C# however, as you can find people facing the same challenges in other languages and platforms:

Many of the answers above reference "buffering" as being the cause. That's probably true at the system level, but even if the client consistently flushed data after every write, the order of data could still get mangled by the layers of indirection that it has to go through before it reaches your code. In .NET, for example, you can theoretically do a Task.WhenAny(...) loop on stdout/stderr reads, but that would only preserve order assuming it's guaranteed in all the underlying stream plumbing (which it probably is not).

I think the main barrier is that the standard streams were not meant to be synchronized from a conceptual standpoint, so any successful attempts to achieve that may just be coincidental.

aetonsi commented 1 year ago

Yes i saw a few of those exact threads you linked.. i did find some solution for linux a long time ago, can't remember what exactly, that altered or even disabled the console's buffering and allowed for ordered output. I can't remember if i even tested it though, as i focus mainly on Windows atm. Still, there has to be a way to like, "hook deeper" into the system's calls, right below where said buffering occurs.. but how? how do cmd/pwsh do it for example? but this is more a question for some stack* website i think.... It's bittersweet to know that me being a c# newbie wasn't the problem :\