PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
43.66k stars 7.08k forks source link

Arguments for external executables aren't correctly escaped #1995

Closed be5invis closed 1 year ago

be5invis commented 7 years ago

Steps to reproduce

  1. write a C program native.exe which acquires ARGV
  2. Run native.exe "`"a`""

    Expected behavior

ARGV[1] == "a"

Actual behavior

ARGV[1] == a

Environment data

Windows 10 x64

Name                           Value
----                           -----
PSVersion                      5.1.14393.0
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.14393.0
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
andyleejordan commented 7 years ago

Why do you think that "\"a\"" the expected behavior? My understanding of PowerShell escapes says that the actual behavior is the correct and expected behavior. ""a"" is a pair of quotes surrounding an escaped pair of quotes surrounding an a, so PowerShell interprets the outer unescaped pair as "this is a string argument" and so drops them, then interprets the escaped pair as escaped quotes and so keeps them, leaving you with "a". At no point was a \ added to the string.

The fact that Bash uses \ as an escape character is irrelevant. In PowerShell, the escape character is a backtick. See PowerShell escape characters.

If you want to pass literally "\"a\"", I believe you would use:

> echo `"\`"a\`"`"
"\"a\""
be5invis commented 7 years ago

@andschwa Yes, escapes works fine for internal cmdlets, but things get weird when communicate with native binaries, especially on Windows. When running native.exe ""a"", the ARGV[1] should be

"a"

(three characters)

instead of

a

(one character).

be5invis commented 7 years ago

Currently to make native.exe correctly receive an ARGV with two quotes and an a character, you have to use this weird call:

native.exe "\`"a\`""
andyleejordan commented 7 years ago

Ah, I see. Re-opening.

andyleejordan commented 7 years ago

Out of a strong curiosity, what happens if you try a build using #1639?

be5invis commented 7 years ago

@andschwa The same. You HAVE to double-esacpe to satisify both PowerShell and CommandLineToArgvW. This line:

native.exe "`"a`""

results a StartProcess equalivent to cmd

native.exe ""a""
andyleejordan commented 7 years ago

@be5invis @douglaswth is this resolved via https://github.com/PowerShell/PowerShell/pull/2182?

be5invis commented 7 years ago

No, We still need to add a backslash before a backtick-escaped double quote? This does not solve the double-escaping problem. (That is, we have to escape a double quote for both PowerShell and CommandLineToArgvW.)

vors commented 7 years ago

Since ""a"" is equal to '"a"', do you suggest that native.exe '"a"' should result in "\"a\""?

douglaswth commented 7 years ago

This seems like a feature request that if implemented could break a large number of already existing PowerShell scripts that use the required double escaping, so extreme care would be required with any solution.

be5invis commented 7 years ago

@vors Yes. @douglaswth The double-escaping is really silly: why do we need the “inner” escapes made in the DOS era?

be5invis commented 7 years ago

@vors @douglaswth This is a the C code used to show GetCommandLineW and CommandLineToArgvW results:

#include <stdio.h>
#include <wchar.h>
#include <Windows.h>

int main() {
  LPWSTR cmdline = GetCommandLineW();
  wprintf(L"Command Line : %s\n", cmdline);

  int nArgs;
  LPWSTR *szArglist = CommandLineToArgvW(cmdline, &nArgs);
  if (NULL == szArglist) {
    wprintf(L"CommandLineToArgvW failed\n");
    return 0;
  } else {
    for (int i = 0; i < nArgs; i++) {
      wprintf(L"argv[%d]: %s\n", i, szArglist[i]);
    }
  }
  LocalFree(szArglist);
}
be5invis commented 7 years ago

Here is the result

$ ./a "a b"
Command Line : "Z:\playground\ps-cmdline\a.exe" "a b"
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a b

$ ./a 'a b'
Command Line : "Z:\playground\ps-cmdline\a.exe" "a b"
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a b

$ ./a 'a"b'
Command Line : "Z:\playground\ps-cmdline\a.exe" a"b
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: ab

$ ./a 'a"b"c'
Command Line : "Z:\playground\ps-cmdline\a.exe" a"b"c
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: abc

$ ./a 'a\"b\"c'
Command Line : "Z:\playground\ps-cmdline\a.exe" a\"b\"c
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a"b"c
douglaswth commented 7 years ago

@be5invis I do not disagree with you about the double escaping being annoying, but I am merely saying that a change to this would need to be backward compatible with what existing PowerShell scripts use.

be5invis commented 7 years ago

How many are them? I do not think there are script writers know about such double-quoting. It is a bug, not feature, and it is not documented.

???? iPhone

? 2016?9?21??01:58?Douglas Thrift notifications@github.com<mailto:notifications@github.com> ???

@be5invishttps://github.com/be5invis I do not disagree with you about the double escaping being annoying, but I am merely saying that a change to this would need to be backward compatible with what existing PowerShell scripts use.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/PowerShell/PowerShell/issues/1995#issuecomment-248381045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAOp20f_W0mTl2YiJKi_flQBJUKaeAnLks5qsB7ZgaJpZM4JpVin.

douglaswth commented 7 years ago

PowerShell has been around for 9 years so there are very likely a good number of scripts out there. I found plenty of information about the need for double escaping from StackOverflow and other sources when I ran into the need for it so I don't know if I agree with your claims about nobody knowing about the need for it or that it is not documented.

vors commented 7 years ago

For the additional context, I'd like to talk a little bit about the implementation. PowerShell calls .NET API to spawn a new process, which calls a Win32 API (on windows).

Here, PS creates StartProcessInfo that is uses https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1063

The provided API takes a single string for arguments and then it's re-parsed into an array of arguments to do the execution. The rules of this re-parsing are not controlled by PowerShell. It's a Win32 API (and fortunately, it consistent in dotnet core and unix rules). Particularly, this contract describes the \ and " behavior.

Although, PowerShell may try to be smarter and provide a nicer experience, the current behavior is consistent with cmd and bash: you can copy native executable line from them and use it in powershell and it works the same.

@be5invis If you know a way to enhance the expirience in non-breaking way, please line up the details. For the breaking changes, we would need to use RFC process, as described in https://github.com/PowerShell/PowerShell/blob/master/docs/dev-process/breaking-change-contract.md

TSlivede commented 7 years ago

This applies to Windows, but when running commands on Linux or Unix, its strange that one needs to double escape quotes.

On Linux processes don't have a single commandline but instead an array of arguments. Therefore arguments in powershell should be the same as those, that are passed to the executable, instead of merging all arguments and then resplitting.

Even on windows, the current behavior is inconsistent: If an argument contains no spaces, it is passed unchanged. If an argument contains spaces, if it will be surrounded by quotes, to keep it together through CommandLineToArgvW call. => Argument is changed to meet CommandLineToArgvW requirement. But if argument contains quotes, those are not escaped. => Argument is not changed, although CommandLineToArgvW requires this.

I think arguments should either never be changed, or always be changed to meet CommandLineToArgvW requirements, but not in half of the cases.

Regarding breaking-the-contract: As I couldn't find any official documentation about double escaping, I'd consider this as category "Bucket 2: Reasonable Grey Area", so there are chances to change this, or am I wrong?

be5invis commented 7 years ago

@vors This is extremely annoying if your argument is an variable or something else: you have to manually escape it before sending it into a native app. An "auto-escaping" operator may help. like ^"a"" -> "a\""`

vors commented 7 years ago

I think @TSlivede put it right with the inconsistency in the behavior.

I think arguments should either never be changed, or always be changed to meet CommandLineToArgvW requirements, but not in half of the cases.

I'm not sure about the bucket, but even the "clearly breaking change" bucket could potentially be changed. We want to make PowerShell better, but backward compatibility is one of our highest priorities. That's why it's not so easy. We have a great community and I'm confident that we can find consensus.

Would anybody want to start an RFC process?

lzybkr commented 7 years ago

It would be worth investigating the use of P/Invoke instead of .Net to start a process if that avoids the need for PowerShell to add quotes to arguments.

vors commented 7 years ago

@lzybkr as far as I can tell, PInvoke would not help. And this is where unix and windows APIs are different:

https://msdn.microsoft.com/en-us/library/20y988d2.aspx (treats spaces as separators) https://linux.die.net/man/3/execvp (doesn't treat spaces as separators)

lzybkr commented 7 years ago

I wasn't suggesting changing the Windows implementation.

vors commented 7 years ago

I'd try to avoid having platform-specific behavior here. It will hurt scripts portability. I think we can consider changing windows behavior in a non-breaking way. I.e. with preference variable. And then we can have different defaults or something like that.

lzybkr commented 7 years ago

We're talking about calling external commands - somewhat platform dependent anyway.

TSlivede commented 7 years ago

Well, i think it can't be really platform independent, as Windows and Linux just have different ways to call executables. In Linux a process gets an argument array while on Windows a process just gets a single commandline (one string). (compare the more basic CreateProcess -> commandline (https://msdn.microsoft.com/library/windows/desktop/ms682425) and execve -> command array (https://linux.die.net/man/2/execve) )

As Powershell adds those quotes when arguments have spaces in them, it seems to me, that powershell tries\ to pass the arguments in a way, that CommandLineToArgvW splits the commandline to the arguments that were originally given in powershell. (This way a typical c-program gets the same arguments in its argv array as a powershell function gets as $args.) This would perfectly match to just passing the arguments to the linux systemcall (as suggested via p/invoke).

\ (and fails, as it doesn't escape quotes)

PS: What is necessary to start an RFC process?

lzybkr commented 7 years ago

Exactly - PowerShell tries to make sure CommandLineToArgvW produces the correct command and after reparsing what PowerShell has already parsed.

This has been a longstanding pain point on Windows, I see on reason to bring that difficulty over to *nix.

To me, this feels like an implementation detail, not really needing an RFC. If we changed behavior in Windows PowerShell, it might warrant an RFC, but even then, the right change might be considered a (possibly risky) bug fix.

TSlivede commented 7 years ago

Yes, I also think, that changing it on Linux to use a direct system call would make everyone feel more happy.

I still think it should also be changed on windows, (Maybe by adding a preference variable for those who don't want to change their scripts) because it's just wrong now - it is a bug. If this was corrected, a direct syscall on linux wouldn't even be necessary, because any argument would reach the next process unchanged.

But as there are executables, that split the commandline in a way, incompatible to CommandLineToArgvW, I like @be5invis's idea of an operator for arguments - but I wouldn't create an auto-escape operator (should be default for all arguments), but instead add an operator to not escape an argument (add no quotes, don't escape anything).

rkeithhill commented 7 years ago

This issue just came up for us today when someone tried the following command in PowerShell and was dissing PowerShell when it didn't work but CMD did:

wmic useraccount where name='username' get sid

From PSCX echoargs, wmic.exe sees this:

94> echoargs wmic useraccount where name='tso_bldadm' get sid
Arg 0 is <wmic>
Arg 1 is <useraccount>
Arg 2 is <where>
Arg 3 is <name=tso_bldadm>
Arg 4 is <get>
Arg 5 is <sid>

Command line:
"C:\Users\hillr\Documents\WindowsPowerShell\Modules\Pscx\3.2.2\Apps\EchoArgs.exe" wmic useraccount where name=tso_bldadm get sid

So what API does CMD.exe use to invoke the process / form the command line? For that matter, what does --% do to make this command work?

be5invis commented 7 years ago

@rkeithhill CreateProcessW. direct call. really.

cspotcode commented 7 years ago

Why is Powershell behaving differently in these two situations? Specifically, it is inconsistently wrapping args containing spaces in double-quotes.

# Desired argv[1] is 4 characters: A, space, double-quote, B
$ .\echoargs.exe 'A \"B'
<"C:\test\echoargs.exe" "A \"B">
<A "B>
# Correct!

# Desired argv value is 4 characters: A, double-quote, space, B
$ .\echoargs.exe 'A\" B'
<"C:\test\echoargs.exe" A\" B>
<A"> <B>
# Wrong...

There seems to be no rhyme or reason. In the first situation, it wraps my arg with double-quotes, but in the second situation it doesn't. I need to know exactly when it will and won't wrap in double-quotes so that I can manually wrap (or not) in my script.

.\echoargs.exe is created by compiling the following with cl echoargs.c

// echoargs.c
#include <windows.h>
#include <stdio.h>
int wmain(int argc, WCHAR** argv) {
    wprintf(L"<%ls>\n", GetCommandLineW());
    for(int i = 1; i < argc; i++) {
        wprintf(L">%s< ", argv[i]);
    }
    wprintf(L"\n");
}

EDIT: Here's my $PSVersionTable:

Name                           Value
----                           -----
PSVersion                      5.1.15063.296
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.15063.296
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
TSlivede commented 7 years ago

The behaviour regarding quotes changed multiple times, therefore I'd suggest to use something like this:

Edit: Updated form below Old version: ``` # call helper function Run-Native($command) { $env:commandlineargumentstring=($args | %{'"'+ ($_ -replace '(\\*)"','$1$1\"' -replace '(\\*)$','$1$1') + '"'}) -join ' '; & $command --% %commandlineargumentstring% } # some test cases Run-Native .\echoargs.exe 'A "B' 'A" B' Run-Native .\echoargs.exe 'A "B' Run-Native .\echoargs.exe 'A" B' Run-Native .\echoargs.exe 'A\" B\\" \' ``` Output: ``` <"C:\test\echoargs.exe" "A \"B" "A\" B"> <"C:\test\echoargs.exe" "A \"B"> <"C:\test\echoargs.exe" "A\" B"> <"C:\test\echoargs.exe" "A\\\" B\\\\\" \\"> ``` The first `-replace` doubles backslashes in front of quotes and adds one additional backslash, to escape the qoute. The second `-replace` doubles backslashes at the end of the argument, such that the closing quote is not escaped.

This uses --% (PS v3 and above), which is AFAIK the only reliable way to pass quotes to native executables.


Edit:

Updated version of Run-Native, now called Invoke-NativeCommand (as suggested)

function Invoke-NativeCommand() {
    $command, [string[]] $argsForExe = $args
    if($argsForExe.Length -eq 0){
        & $command
    } else {
        $env:commandlineargumentstring=($argsForExe | %{
            if($_ -match '^[\w\d\-:/\\=]+$'){
                $_ #don't quote nonempty arguments consisting of only letters, numbers, or one of -:/\=
            } else {
                $_ <# double backslashes in front of quotes and escape quotes with backslash #> `
                    -replace '(\\*)"','$1$1\"' `
                   <# opening quote after xxx= or after /xxx: or at beginning otherwise #> `
                    -replace '^([\w\d]+=(?=.)|[/-][\w\d]+[:=](?=.)|^)','$1"' `
                   <# double backslashes in front of closing quote #> `
                    -replace '(\\*)$','$1$1' `
                   <# add closing quote #> `
                    -replace '$','"'
            }
        }) -join ' ';
        & $command --% %commandlineargumentstring%
    }
}

(with some inspiration from iep)

cspotcode commented 7 years ago

Thanks, I didn't know about --%. is there any way to do that without leaking the environment variable to the native process? (and to any processes it might invoke)

Is there a PowerShell module that implements a Run-Native Cmdlet for everyone to use? This sounds like something that should be on the Powershell Gallery. If it were good enough, it could be the basis for an RFC.

TSlivede commented 7 years ago

"leaking" sounds like you are concerned about security. Notice however that the commandline is visible to child processes anyway. (For example: gwmi win32_process |select name,handle,commandline|Format-Table on Windows and ps -f on Linux)

If you still want to avoid an environment variable, you may be able to construct something using invoke-expression.

Regarding the RFC: I don't think such a commandlet should be necessary, instead this should be the default behavior:

https://github.com/PowerShell/PowerShell-RFC/issues/90

cspotcode commented 7 years ago

I agree that PowerShell's default behavior should be fixed. I had been pessimistically assuming that it would never change for backwards compatibility reasons, which is why I suggested writing a module. However, I really like the way your RFC allows the old escaping behavior to be re-enabled via a preference variable.

mklement0 commented 6 years ago

Let me summarize the discussion, with just the right dose of opinion:


By the future, I mean:


Implementing the future:

TSlivede commented 6 years ago

As https://github.com/PowerShell/PowerShell/issues/4358 was closed as duplicate of this, here a short summary of that problem:

If an argument of an external executable with a trailing backslash contains a space, it is currently naively quoted (add quote before and after the argument). Any executable, that follows the usual rules interprets that like this:
From @mklement0's comment:

The 2nd " in ".\test 2\", due to being preceded by \ is interpreted as an escaped ", causing the remainder of the string - despite a then-missing closing " to be interpreted as part of the same argument.

Example:
(from @akervinen's comment)

PS X:\scratch> .\ps-args-test.exe '.\test 2\'
Received argument: .\test 2"

The Problem occurs very often, because PSReadLine adds a trailing backslash on auto-completion for directories.

SteveL-MSFT commented 6 years ago

Since corefx seems open to producing the api we need, I'm deferring this to 6.1.0. For 6.0.0, I'll see if we can fix #4358

SteveL-MSFT commented 6 years ago

@TSlivede I took your function, renamed it to Invoke-NativeCommand (as Run isn't a valid verb) and added an alias ^ and published it as a module on PowerShellGallery:

install-module NativeCommand -scope currentuser
^ ./echoargs 'A "B' 'A" B'
mklement0 commented 6 years ago

@SteveL-MSFT:

It's nice to have a stopgap, but a less cumbersome one would be - while we wait for a CoreFX solution - to implement the well-defined official quoting / argument-parsing rules as detailed in @TSlivede's RFC proposal ourselves preliminarily - which doesn't sound too hard to do.

If we only fix the \" problem, argument passing is still fundamentally broken, even in simple scenarios such as the following:

PS> bash -c 'echo "hi there"'
hi    # !! Bash sees the following tokens:  '-c', 'echo hi', 'there'

I think at this point there's sufficient agreement on what the behavior should be so we don't need a full RFC process, do we?

The only outstanding decision is how to deal with backward-compatibility issues in Windows.

be5invis commented 6 years ago

@mklement0 @SteveL-MSFT Are we already broke compatibility?

joeyaiello commented 6 years ago

The only outstanding decision is how to deal with backward-compatibility issues in Windows.

Yeah, but that's the hard part, right?

@be5invis what do you mean by "already broke compatibility"?

Plus, if CoreFX is on the verge of a fix in their layer, I'd rather not create a stopgap in our layer before they do.

And as someone said above in the thread, this is annoying, but it's also pretty well-documented in the community. I'm not sure we should break it twice in the next two releases.

mklement0 commented 6 years ago

@joeyaiello:

Isn't the fix for #4358 already a breaking change for those who've worked around the issue by doubling the final \; e.g., "c:\tmp 1\\"? In other words: if you limit the changes to this fix, two breaking changes are guaranteed: this one now, and another later after switching to the future CoreFx API; and while that could also happen if a complete stopgap were to be implemented now, it is unlikely, given what we know about this coming change.

Conversely, it may hamper adoption on Unix if common quoting scenarios such as
bash -c 'echo "hi there"' don't work properly.

I do realize that fixing this is a much larger breaking change, however.

SteveL-MSFT commented 6 years ago

@PowerShell/powershell-committee discussed this and agreed that minimally, using --% should have the same behavior as bash in that the quotes are escaped so that the native command receives them. What is still open for debate is if this should be the default behavior w/o using --%

mklement0 commented 6 years ago

Note:

Arguably, we should target /bin/sh as well - which, however, means that some Bash features - notably brace expansion, certain automatic variables, ... - won't be available


Use of --%

I'll use command echoargs --% 'echo "hi there"' as an example below.

the same behavior as bash in that the quotes are escaped so that the native command receives them.

The way to do in the future, once the CoreFX API has been extended would be to perform no escaping at all, and instead do the following:

In effect, the command line is passed through as-is to the shell executable, which can then perform its parsing.

I understand that, in the current absence of an array-based way to pass literal arguments, we need to combine -c and echoargs 'echo "hi there"' into a single string with escaping, regrettably solely for the benefit of the CoreFX API, which, when it comes time to create the actual process, then reverses this step and splits the single string back into literal tokens - and ensuring that this reversal always results in the original list of literal tokens is the challenging part.

Again: The only reason to involve escaping here at all is due to the current CoreFX limitation. To work with this limitation, the following single, escaped string must therefore be assigned to the .Arguments property of a ProcessStartInfo instance, with the escaping performed as specified by Parsing C++ Command-Line Arguments:

What is still open for debate is if this should be the default behavior w/o using --%

The default behavior on Unix should be very different:

Applied to the example without --%: echoargs 'echo "hi there"':

Again, in the absence of .ArgumentList that is not an option yet, but in the interim the same MS C++-compliant auxiliary escaping as described above could be employed.

TSlivede commented 6 years ago

@SteveL-MSFT As I already mentioned at Make the stop-parsing symbol (--%) work on Unix (#3733) I'd strongly advise against changing the behavior of --%.

If some special functionality for /bin/sh -c is needed please use a different symbol and leave --% the way it is!

mklement0 commented 6 years ago

@TSlivede:

If something --%-like is implemented on Unix - and with native globbing and a generally more command-line-savvy crowd on Unix I perceive less of a need for it - then choosing a different symbol - such as --$ - probably makes sense (sorry, I'd lost track of all aspects of this lengthy, multi-issue debate).

Different symbols would also serve as visually conspicuous reminders that non-portable platform-specific behavior is being invoked.

That leaves the question what PowerShell should do when it comes across --% on Unix and --$ on Windows.

SteveL-MSFT commented 6 years ago

I'm fine leaving --% as-is. Introducing something like --$ which calls out to /bin/sh and I guess cmd.exe on Windows may be a good way to solve this.

iSazonov commented 6 years ago

No chance of creating a cmdlet for these behaviors?

SteveL-MSFT commented 6 years ago

@iSazonov are you suggesting something like Invoke-Native? Not sure I'm a fan of that.