dotnet / command-line-api

Command line parsing, invocation, and rendering of terminal output.
https://github.com/dotnet/command-line-api/wiki
MIT License
3.35k stars 376 forks source link

CommandLineStringSplitter does not conform to "standard" quote escaping rules #1758

Open jonsequitur opened 2 years ago

jonsequitur commented 2 years ago

The CommandLineStringSplitter provides a way to split a string into a string[] and is intended to conform to the way that strings are split on the command line.

It has two primary use cases:

Various shells will exhibit different behaviors here due to different quote escaping rules, so the user's experience of this splitting logic might not match up with a single implementation of CommandLineStringSplitter. Because of this, and to eventually provide higher fidelity for the completions experience on different shells in the presence of escapes on the command line, it might be useful to provide multiple implementations over time and we should consider anticipating that need in the API design now.

Resources

Related


For the purposes of illustration in examples below, I'll be using the following program to confirm behaviors that differ among various shells:

Console.WriteLine("---- DEFAULT SPLIT            ---");

foreach (var arg in args)
{
    Console.WriteLine(arg);
}

Console.WriteLine("---- Environment.CommandLine ----");

Console.WriteLine(Environment.CommandLine);
jonsequitur commented 2 years ago

Copied from @ptr727's comment on #1781:

Problem with the string[] args is that it breaks when using e.g. "D:\", where the " was treated as an embedded quote, not a terminator quote.

Also, the vanilla args[] does not (did not) support unicode or UTF8 characters (critical when piping between processes), thus one was forced to use the W/unicode API to get the original commandline, then parse by hand.

With the introduction of .NET Core, I asked, and the team did not want to change the behavior from how WIN32 in C++ works, even if broken.

The McMaster, gsscoder, and can't remember the other CLI library I used before did not want to include the "correct" parser as I referenced on stackoverflow, thus myself and others just created the manual parsing code, and ignored the broken args[] behavior. Found one: https://github.com/gsscoder/commandline/issues/473

Now, things may have changed, maybe .NET Core now correctly parses in args[], but that may be something you could verify, I've given up, or more I can't be bothered to try to get a team to fix something I think is/was broken.

And yep, https://github.com/dotnet/command-line-api/issues/1758 sorta describes the issues with not correctly parsing. See: https://github.com/ptr727/Utilities/blob/main/UtilitiesTests/CommandLineTests.cs Also the stackoverflow article has grown to be very complete.

iSazonov commented 1 year ago

I hope great @mklement0 review https://github.com/PowerShell/PowerShell/issues/15143 and his module helps to resolve the issue.

Jozkee commented 1 year ago

This won't work without breaking supported scenarios.

@treenewlyn is asking for a way to escape the quotes in his command, and the existing way of doing that is with backslash, which makes sense.

But we currently support tokens that are quoted and have an ending backslash, one example is a windows path that ends in separator. if we introduce the escaping logic that @treenewlyn is asking for, those cases would consider the "ending separator + quote" as a quote and fail. https://github.com/dotnet/command-line-api/blob/e965ae227624f1b1e53a732a7835641452a68c69/src/System.CommandLine.Tests/Parsing/CommandLineStringSplitterTests.cs#L29-L37

You can't have both, and since we already have tests showing that windows paths with ending separator are used, I don't know what's the right approach to this. Either do nothing, or take the breaking change.

I already tried a couple of times to reconcile both scenarios, but failed.

For reference, this is what CMD does with dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false: You can see it works as OP is asking for.

C:\my-app>dotnet run "dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false"
foreach (string arg in args): dotnet publish "xxx.csproj" -c Release -o "./bin/latest" -r linux-x64 --self-contained false

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false"

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,dotnet publish "xxx.csproj" -c Release -o "./bin/latest" -r linux-x64 --self-contained false]

And this with rm -r "c:\temp files\", if you don't escape the quotes: It concats "rm -r "c:\temp and then splits files\".

C:\my-app>dotnet run "rm -r "c:\temp files\""
foreach (string arg in args): rm -r c:\temp,files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "rm -r c:\temp" "files\""

Environment.GetCommandLineArgs(3): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,rm -r c:\temp,files"]

And if I escape the quotes rm -r \"c:\temp files\\": Is surprisingly deleting the ending \, not even using it as escape character for ".

C:\my-app>dotnet run "rm -r \"c:\temp files\""
foreach (string arg in args): rm -r "c:\temp files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "rm -r \"c:\temp files\""

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,rm -r "c:\temp files"]

Since -r is meaningful for dotnet cli, I also tried the following dotnet run "c:\temp files\": In this one you can see that the quote is preserved due to the ending \ acting as a escape character, which is inconsistent with the previous example.

C:\my-app>dotnet run "c:\temp files\"
foreach (string arg in args): c:\temp files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "c:\temp files\""

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,c:\temp files"]

If CMD uses CommandLineToArgvW under the covers, Then we are already doing something different to that function, so I wonder if we should keep striving to behave like it or not?

Jozkee commented 1 year ago

my-app's code looks as the following:

Console.Write("foreach (string arg in args): ");
for (int i = 0; i < args.Length; i++)
{
    Console.Write(args[i]);
    if (i < args.Length - 1)
        Console.Write(",");
}

Console.WriteLine();
Console.WriteLine();

Console.Write("Environment.CommandLine: ");
Console.Write(Environment.CommandLine);

Console.WriteLine();
Console.WriteLine();

var cmdLineArgs = Environment.GetCommandLineArgs();
Console.Write($"Environment.GetCommandLineArgs({cmdLineArgs.Length}): ");
Console.Write('[' + string.Join(',', cmdLineArgs) + "]");
Console.WriteLine();
KalleOlaviNiemitalo commented 1 year ago

If CMD uses CommandLineToArgvW under the covers

.NET Runtime can call CommandLineToArgvW here https://github.com/dotnet/runtime/blob/aaf9c8a7702920c4999e0535939e981038bbf42c/src/libraries/System.Private.CoreLib/src/System/Environment.Windows.cs#L220

CreateProcess in Windows takes a command-line string and the new process decides whether and how to split it to arguments. IIRC, some C library passes a list of argument strings via the cbReserved2 and lpReserved2, but I don't know whether .NET Runtime reads those at all.

KalleOlaviNiemitalo commented 1 year ago

I feel the completion function in the shell should parse quotation marks etc. as needed, and pass to dotnet-suggest a list of strings, the position of the string that contains the insertion point, and the position of the insertion point within that string. It should then read the resulting completions and add any shell-dependent backslashes or quotation marks. That way, dotnet-suggest and the completion sources would not need to support the syntaxes of individual shells.

OTOH, if the shell scripting makes this too slow, then it can be reimplemented in dotnet-suggest -- still not in the completion sources. And this should be an optional optimisation; the argv array mechanism should still be available for shells that dotnet-suggest does not explicitly support.

jonsequitur commented 1 year ago

Pushing these responsibilities to the shell is ideal, but it has a longer timeline. Implementing per-shell solutions should be more performant (and shouldn't even require dotnet-suggest or any other intermediate .NET-based process), but it's a lot more work and opens up a large test matrix. A design where System.CommandLine can handle this parsing will remain a useful fallback as support for specific shells comes in.

Other scenarios this method also addresses:

These are a few examples of places where a command line parser is useful outside of the Main method in a CLI app.