Closed be5invis closed 1 year ago
Why do you think that "\"a\""
the expected behavior? My understanding of PowerShell escapes says that the actual behavior is the correct and expected behavior. "
"a""
is a pair of quotes surrounding an escaped pair of quotes surrounding an a
, so PowerShell interprets the outer unescaped pair as "this is a string argument" and so drops them, then interprets the escaped pair as escaped quotes and so keeps them, leaving you with "a"
. At no point was a \
added to the string.
The fact that Bash uses \
as an escape character is irrelevant. In PowerShell, the escape character is a backtick. See PowerShell escape characters.
If you want to pass literally "\"a\""
, I believe you would use:
> echo `"\`"a\`"`"
"\"a\""
@andschwa
Yes, escapes works fine for internal cmdlets, but things get weird when communicate with native binaries, especially on Windows.
When running native.exe "
"a""
, the ARGV[1] should be
"a"
(three characters)
instead of
a
(one character).
Currently to make native.exe
correctly receive an ARGV with two quotes and an a
character, you have to use this weird call:
native.exe "\`"a\`""
Ah, I see. Re-opening.
Out of a strong curiosity, what happens if you try a build using #1639?
@andschwa The same. You HAVE to double-esacpe to satisify both PowerShell and CommandLineToArgvW
. This line:
native.exe "`"a`""
results a StartProcess equalivent to cmd
native.exe ""a""
@be5invis @douglaswth is this resolved via https://github.com/PowerShell/PowerShell/pull/2182?
No, We still need to add a backslash before a backtick-escaped double quote? This does not solve the double-escaping problem. (That is, we have to escape a double quote for both PowerShell and CommandLineToArgvW.)
Since "
"a""
is equal to '"a"'
, do you suggest that native.exe '"a"'
should result in "\"a\""
?
This seems like a feature request that if implemented could break a large number of already existing PowerShell scripts that use the required double escaping, so extreme care would be required with any solution.
@vors Yes. @douglaswth The double-escaping is really silly: why do we need the “inner” escapes made in the DOS era?
@vors @douglaswth This is a the C code used to show GetCommandLineW and CommandLineToArgvW results:
#include <stdio.h>
#include <wchar.h>
#include <Windows.h>
int main() {
LPWSTR cmdline = GetCommandLineW();
wprintf(L"Command Line : %s\n", cmdline);
int nArgs;
LPWSTR *szArglist = CommandLineToArgvW(cmdline, &nArgs);
if (NULL == szArglist) {
wprintf(L"CommandLineToArgvW failed\n");
return 0;
} else {
for (int i = 0; i < nArgs; i++) {
wprintf(L"argv[%d]: %s\n", i, szArglist[i]);
}
}
LocalFree(szArglist);
}
Here is the result
$ ./a "a b"
Command Line : "Z:\playground\ps-cmdline\a.exe" "a b"
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a b
$ ./a 'a b'
Command Line : "Z:\playground\ps-cmdline\a.exe" "a b"
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a b
$ ./a 'a"b'
Command Line : "Z:\playground\ps-cmdline\a.exe" a"b
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: ab
$ ./a 'a"b"c'
Command Line : "Z:\playground\ps-cmdline\a.exe" a"b"c
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: abc
$ ./a 'a\"b\"c'
Command Line : "Z:\playground\ps-cmdline\a.exe" a\"b\"c
argv[0]: Z:\playground\ps-cmdline\a.exe
argv[1]: a"b"c
@be5invis I do not disagree with you about the double escaping being annoying, but I am merely saying that a change to this would need to be backward compatible with what existing PowerShell scripts use.
How many are them? I do not think there are script writers know about such double-quoting. It is a bug, not feature, and it is not documented.
???? iPhone
? 2016?9?21??01:58?Douglas Thrift notifications@github.com<mailto:notifications@github.com> ???
@be5invishttps://github.com/be5invis I do not disagree with you about the double escaping being annoying, but I am merely saying that a change to this would need to be backward compatible with what existing PowerShell scripts use.
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/PowerShell/PowerShell/issues/1995#issuecomment-248381045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAOp20f_W0mTl2YiJKi_flQBJUKaeAnLks5qsB7ZgaJpZM4JpVin.
PowerShell has been around for 9 years so there are very likely a good number of scripts out there. I found plenty of information about the need for double escaping from StackOverflow and other sources when I ran into the need for it so I don't know if I agree with your claims about nobody knowing about the need for it or that it is not documented.
For the additional context, I'd like to talk a little bit about the implementation. PowerShell calls .NET API to spawn a new process, which calls a Win32 API (on windows).
Here, PS creates StartProcessInfo that is uses https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1063
The provided API takes a single string for arguments and then it's re-parsed into an array of arguments to do the execution.
The rules of this re-parsing are not controlled by PowerShell. It's a Win32 API (and fortunately, it consistent in dotnet core and unix rules).
Particularly, this contract describes the \
and "
behavior.
Although, PowerShell may try to be smarter and provide a nicer experience, the current behavior is consistent with cmd and bash: you can copy native executable line from them and use it in powershell and it works the same.
@be5invis If you know a way to enhance the expirience in non-breaking way, please line up the details. For the breaking changes, we would need to use RFC process, as described in https://github.com/PowerShell/PowerShell/blob/master/docs/dev-process/breaking-change-contract.md
This applies to Windows, but when running commands on Linux or Unix, its strange that one needs to double escape quotes.
On Linux processes don't have a single commandline but instead an array of arguments. Therefore arguments in powershell should be the same as those, that are passed to the executable, instead of merging all arguments and then resplitting.
Even on windows, the current behavior is inconsistent:
If an argument contains no spaces, it is passed unchanged.
If an argument contains spaces, if it will be surrounded by quotes, to keep it together through CommandLineToArgvW
call. => Argument is changed to meet CommandLineToArgvW
requirement.
But if argument contains quotes, those are not escaped. => Argument is not changed, although CommandLineToArgvW
requires this.
I think arguments should either never be changed, or always be changed to meet CommandLineToArgvW
requirements, but not in half of the cases.
Regarding breaking-the-contract: As I couldn't find any official documentation about double escaping, I'd consider this as category "Bucket 2: Reasonable Grey Area", so there are chances to change this, or am I wrong?
@vors This is extremely annoying if your argument is an variable or something else: you have to manually escape it before sending it into a native app.
An "auto-escaping" operator may help. like ^"a
"" -> "a\
""`
I think @TSlivede put it right with the inconsistency in the behavior.
I think arguments should either never be changed, or always be changed to meet CommandLineToArgvW requirements, but not in half of the cases.
I'm not sure about the bucket, but even the "clearly breaking change" bucket could potentially be changed. We want to make PowerShell better, but backward compatibility is one of our highest priorities. That's why it's not so easy. We have a great community and I'm confident that we can find consensus.
Would anybody want to start an RFC process?
It would be worth investigating the use of P/Invoke instead of .Net to start a process if that avoids the need for PowerShell to add quotes to arguments.
@lzybkr as far as I can tell, PInvoke would not help. And this is where unix and windows APIs are different:
https://msdn.microsoft.com/en-us/library/20y988d2.aspx (treats spaces as separators) https://linux.die.net/man/3/execvp (doesn't treat spaces as separators)
I wasn't suggesting changing the Windows implementation.
I'd try to avoid having platform-specific behavior here. It will hurt scripts portability. I think we can consider changing windows behavior in a non-breaking way. I.e. with preference variable. And then we can have different defaults or something like that.
We're talking about calling external commands - somewhat platform dependent anyway.
Well, i think it can't be really platform independent, as Windows and Linux just have different ways to call executables. In Linux a process gets an argument array while on Windows a process just gets a single commandline (one string).
(compare the more basic
CreateProcess
-> commandline (https://msdn.microsoft.com/library/windows/desktop/ms682425)
and
execve
-> command array (https://linux.die.net/man/2/execve)
)
As Powershell adds those quotes when arguments have spaces in them, it seems to me, that powershell tries\ to pass the arguments in a way, that CommandLineToArgvW
splits the commandline to the arguments that were originally given in powershell. (This way a typical c-program gets the same arguments in its argv array as a powershell function gets as $args.)
This would perfectly match to just passing the arguments to the linux systemcall (as suggested via p/invoke).
\ (and fails, as it doesn't escape quotes)
PS: What is necessary to start an RFC process?
Exactly - PowerShell tries to make sure CommandLineToArgvW
produces the correct command and after reparsing what PowerShell has already parsed.
This has been a longstanding pain point on Windows, I see on reason to bring that difficulty over to *nix.
To me, this feels like an implementation detail, not really needing an RFC. If we changed behavior in Windows PowerShell, it might warrant an RFC, but even then, the right change might be considered a (possibly risky) bug fix.
Yes, I also think, that changing it on Linux to use a direct system call would make everyone feel more happy.
I still think it should also be changed on windows, (Maybe by adding a preference variable for those who don't want to change their scripts) because it's just wrong now - it is a bug. If this was corrected, a direct syscall on linux wouldn't even be necessary, because any argument would reach the next process unchanged.
But as there are executables, that split the commandline in a way, incompatible to CommandLineToArgvW
, I like @be5invis's idea of an operator for arguments - but I wouldn't create an auto-escape operator (should be default for all arguments), but instead add an operator to not escape an argument (add no quotes, don't escape anything).
This issue just came up for us today when someone tried the following command in PowerShell and was dissing PowerShell when it didn't work but CMD did:
wmic useraccount where name='username' get sid
From PSCX echoargs, wmic.exe sees this:
94> echoargs wmic useraccount where name='tso_bldadm' get sid
Arg 0 is <wmic>
Arg 1 is <useraccount>
Arg 2 is <where>
Arg 3 is <name=tso_bldadm>
Arg 4 is <get>
Arg 5 is <sid>
Command line:
"C:\Users\hillr\Documents\WindowsPowerShell\Modules\Pscx\3.2.2\Apps\EchoArgs.exe" wmic useraccount where name=tso_bldadm get sid
So what API does CMD.exe use to invoke the process / form the command line? For that matter, what does --% do to make this command work?
@rkeithhill CreateProcessW
. direct call. really.
Why is Powershell behaving differently in these two situations? Specifically, it is inconsistently wrapping args containing spaces in double-quotes.
# Desired argv[1] is 4 characters: A, space, double-quote, B
$ .\echoargs.exe 'A \"B'
<"C:\test\echoargs.exe" "A \"B">
<A "B>
# Correct!
# Desired argv value is 4 characters: A, double-quote, space, B
$ .\echoargs.exe 'A\" B'
<"C:\test\echoargs.exe" A\" B>
<A"> <B>
# Wrong...
There seems to be no rhyme or reason. In the first situation, it wraps my arg with double-quotes, but in the second situation it doesn't. I need to know exactly when it will and won't wrap in double-quotes so that I can manually wrap (or not) in my script.
.\echoargs.exe is created by compiling the following with cl echoargs.c
// echoargs.c
#include <windows.h>
#include <stdio.h>
int wmain(int argc, WCHAR** argv) {
wprintf(L"<%ls>\n", GetCommandLineW());
for(int i = 1; i < argc; i++) {
wprintf(L">%s< ", argv[i]);
}
wprintf(L"\n");
}
EDIT: Here's my $PSVersionTable:
Name Value
---- -----
PSVersion 5.1.15063.296
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.15063.296
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
The behaviour regarding quotes changed multiple times, therefore I'd suggest to use something like this:
This uses --%
(PS v3 and above), which is AFAIK the only reliable way to pass quotes to native executables.
Updated version of Run-Native
, now called Invoke-NativeCommand
(as suggested)
function Invoke-NativeCommand() {
$command, [string[]] $argsForExe = $args
if($argsForExe.Length -eq 0){
& $command
} else {
$env:commandlineargumentstring=($argsForExe | %{
if($_ -match '^[\w\d\-:/\\=]+$'){
$_ #don't quote nonempty arguments consisting of only letters, numbers, or one of -:/\=
} else {
$_ <# double backslashes in front of quotes and escape quotes with backslash #> `
-replace '(\\*)"','$1$1\"' `
<# opening quote after xxx= or after /xxx: or at beginning otherwise #> `
-replace '^([\w\d]+=(?=.)|[/-][\w\d]+[:=](?=.)|^)','$1"' `
<# double backslashes in front of closing quote #> `
-replace '(\\*)$','$1$1' `
<# add closing quote #> `
-replace '$','"'
}
}) -join ' ';
& $command --% %commandlineargumentstring%
}
}
(with some inspiration from iep)
""
as escaped "
- will therefore still not work for embedded quotes in .bat
or msiexec argumentsThanks, I didn't know about --%
. is there any way to do that without leaking the environment variable to the native process? (and to any processes it might invoke)
Is there a PowerShell module that implements a Run-Native
Cmdlet for everyone to use? This sounds like something that should be on the Powershell Gallery. If it were good enough, it could be the basis for an RFC.
"leaking" sounds like you are concerned about security. Notice however that the commandline is visible to child processes anyway. (For example: gwmi win32_process |select name,handle,commandline|Format-Table
on Windows and ps -f
on Linux)
If you still want to avoid an environment variable, you may be able to construct something using invoke-expression.
Regarding the RFC: I don't think such a commandlet should be necessary, instead this should be the default behavior:
I agree that PowerShell's default behavior should be fixed. I had been pessimistically assuming that it would never change for backwards compatibility reasons, which is why I suggested writing a module. However, I really like the way your RFC allows the old escaping behavior to be re-enabled via a preference variable.
Let me summarize the discussion, with just the right dose of opinion:
It's clear that we have a backward-compatibility issue, so the old behavior must continue to remain available.
@TSlivede's RFC proposal accounts for that while commendably pointing the way to the future.
Unfortunately, his proposal languishes as a PR as of this writing, and it hasn't even been accepted as a an RFC draft yet.
By the future, I mean:
PowerShell is a shell in its own right that will hopefully soon shed its cmd.exe
-related baggage, so the only considerations that matter when it comes to calling external utilities (executables that are (typically) console/terminal applications) are:
Arguments to pass should be specified by the rules of PowerShell's argument-mode parsing only.
Whatever literals result from that process must be passed as-is to the target executable, as individual arguments.
In other words: As a user, all you should ever need to focus on is what the result of PowerShell's parsing will be, and to be able to rely on that result getting passed as-is, with PowerShell taking care of any behind-the-scenes encoding - if necessary.
Implementing the future:
On Windows:
For historical reasons, Windows does not permit passing arguments as an array of literals to the target executable; instead, a single string encoding all arguments using pseudo shell syntax is needed. What's worse, it is ultimately up to the individual target executable to interpret that single string and split it into arguments.
The best PowerShell can do is to form that single string - behind the scenes, after having performed its own splitting of the command line into individual arguments - in a predictable , standardized manner.
@TSlivede's RFC proposal proposes just that, by suggesting that PowerShell synthesize the pseudo shell command line in a manner that will cause the Windows C/C++ runtime to recover the input arguments as-is when performing its parsing:
Given that it's ultimately up to each target executable to interpret the command line, there is no guarantee that this will work in all cases, but said rules are the most sensible choice, because most existing utilities use these conventions.
The only notable exceptions are batch files, which could receive special treatment, as the RFC proposal suggests.
On Unix platforms:
Strictly speaking, the issues that plague Windows argument parsing need never arise, because the platform-native calls for creating new processes accept arguments as arrays of literals - whatever arguments PowerShell ends up with after performing its own parsing should just be passed on as-is.
To quote @lzybkr: " I see no reason to bring that difficulty over to *nix."
Sadly, due to the current limitations of .NET Core (CoreFX), these issues do come into play, because the CoreFX API needlessly forces the anarchy of Windows argument passing onto the Unix world too, by requiring use of a pseudo command line even on Unix.
I've created this CoreFX issue to ask for that problem to be remedied.
In the meantime, given that CoreFX splits the pseudo command line back into arguments based on the C/C++ rules cited above, @TSlivede's proposal should work on Unix platforms too.
As https://github.com/PowerShell/PowerShell/issues/4358 was closed as duplicate of this, here a short summary of that problem:
If an argument of an external executable with a trailing backslash contains a space, it is currently naively quoted (add quote before and after the argument). Any executable, that follows the usual rules interprets that like this:
From @mklement0's comment:
The 2nd
"
in".\test 2\"
, due to being preceded by \ is interpreted as an escaped ", causing the remainder of the string - despite a then-missing closing " to be interpreted as part of the same argument.
Example:
(from @akervinen's comment)
PS X:\scratch> .\ps-args-test.exe '.\test 2\'
Received argument:.\test 2"
The Problem occurs very often, because PSReadLine adds a trailing backslash on auto-completion for directories.
Since corefx seems open to producing the api we need, I'm deferring this to 6.1.0. For 6.0.0, I'll see if we can fix #4358
@TSlivede I took your function, renamed it to Invoke-NativeCommand
(as Run
isn't a valid verb) and added an alias ^
and published it as a module on PowerShellGallery:
install-module NativeCommand -scope currentuser
^ ./echoargs 'A "B' 'A" B'
@SteveL-MSFT:
It's nice to have a stopgap, but a less cumbersome one would be - while we wait for a CoreFX solution - to implement the well-defined official quoting / argument-parsing rules as detailed in @TSlivede's RFC proposal ourselves preliminarily - which doesn't sound too hard to do.
If we only fix the \"
problem, argument passing is still fundamentally broken, even in simple scenarios such as the following:
PS> bash -c 'echo "hi there"'
hi # !! Bash sees the following tokens: '-c', 'echo hi', 'there'
I think at this point there's sufficient agreement on what the behavior should be so we don't need a full RFC process, do we?
The only outstanding decision is how to deal with backward-compatibility issues in Windows.
@mklement0 @SteveL-MSFT Are we already broke compatibility?
The only outstanding decision is how to deal with backward-compatibility issues in Windows.
Yeah, but that's the hard part, right?
@be5invis what do you mean by "already broke compatibility"?
Plus, if CoreFX is on the verge of a fix in their layer, I'd rather not create a stopgap in our layer before they do.
And as someone said above in the thread, this is annoying, but it's also pretty well-documented in the community. I'm not sure we should break it twice in the next two releases.
@joeyaiello:
Isn't the fix for #4358 already a breaking change for those who've worked around the issue by doubling the final \
; e.g., "c:\tmp 1\\"
? In other words: if you limit the changes to this fix, two breaking changes are guaranteed: this one now, and another later after switching to the future CoreFx API; and while that could also happen if a complete stopgap were to be implemented now, it is unlikely, given what we know about this coming change.
Conversely, it may hamper adoption on Unix if common quoting scenarios such as
bash -c 'echo "hi there"'
don't work properly.
I do realize that fixing this is a much larger breaking change, however.
@PowerShell/powershell-committee discussed this and agreed that minimally, using --%
should have the same behavior as bash in that the quotes are escaped so that the native command receives them. What is still open for debate is if this should be the default behavior w/o using --%
Note:
I'm assuming that a call to an actual shell executable is necessary when using --%
on Unix, as opposed to trying to emulate the shell behavior, which is what happens on Windows. Emulating is not hard on Windows, but would be much harder on Unix, given the many more features that would need emulating.
Using an actual shell then raises the question what shell to use: while bash
is ubiquitous, its default behavior is not POSIX-compliant nor is it required by POSIX to be present, so for portability other scripting languages call out to /bin/sh
, the shell executable decreed by POSIX (which can be Bash running in compatibility mode (e.g., on macOS), but certainly does't have to (e.g., Dash on Ubuntu)).
Arguably, we should target /bin/sh
as well - which, however, means that some Bash features - notably brace expansion, certain automatic variables, ... - won't be available
--%
I'll use command echoargs --% 'echo "hi there"'
as an example below.
the same behavior as bash in that the quotes are escaped so that the native command receives them.
The way to do in the future, once the CoreFX API has been extended would be to perform no escaping at all, and instead do the following:
/bin/sh
as the executable, (effectively) assigned to ProcessStartInfo.FileName
.ProcessStartInfo.ArgumentList
:-c
as the 1st argumentechoargs 'echo "hi there"'
as the 2nd argument - i.e., the original command line used literally, exactly as specified, except that --%
was removed.In effect, the command line is passed through as-is to the shell executable, which can then perform its parsing.
I understand that, in the current absence of an array-based way to pass literal arguments, we need to combine -c
and echoargs 'echo "hi there"'
into a single string with escaping, regrettably solely for the benefit of the CoreFX API, which, when it comes time to create the actual process, then reverses this step and splits the single string back into literal tokens - and ensuring that this reversal always results in the original list of literal tokens is the challenging part.
Again: The only reason to involve escaping here at all is due to the current CoreFX limitation.
To work with this limitation, the following single, escaped string must therefore be assigned to the .Arguments
property of a ProcessStartInfo
instance, with the escaping performed as specified by Parsing C++ Command-Line Arguments:
/bin/sh
as the executable, (effectively) assigned to ProcessStartInfo.FileName
.The following single, escaped string as the value of ProcessStartInfo.Arguments
:
-c "echoargs 'echo \"hi there\"'"
What is still open for debate is if this should be the default behavior w/o using --%
The default behavior on Unix should be very different:
No escaping considerations other than PowerShell's own should ever come into play (except on Windows, where that cannot be avoided, sadly; but there the MS C++ rules are the way to go, to be applied behind the scenes; failing that, --%
provides an escape hatch).
Whatever arguments PowerShell ends up with, after its own parsing, must be passed as an array of literals, via the upcoming ProcessStartInfo.ArgumentList
property.
Applied to the example without --%
: echoargs 'echo "hi there"'
:
PowerShell performs its usual parsing and ends up with the following 2 arguments:
echoargs
echo "hi there"
(single quotes - which only had syntactical function to PowerShell, removed)ProcessStartInfo
is then populated as follows, with the upcoming CoreFX extension in place:
echoargs
as the (effective) .FileName
property valueecho "hi there"
as the only element to add to the Collection<string>
instance exposed by .ArgumentList
.Again, in the absence of .ArgumentList
that is not an option yet, but in the interim the same MS C++-compliant auxiliary escaping as described above could be employed.
@SteveL-MSFT
As I already mentioned at Make the stop-parsing symbol (--%) work on Unix (#3733) I'd strongly advise against changing the behavior of --%
.
If some special functionality for /bin/sh -c
is needed please use a different symbol and leave --%
the way it is!
@TSlivede:
If something --%
-like is implemented on Unix - and with native globbing and a generally more command-line-savvy crowd on Unix I perceive less of a need for it - then choosing a different symbol - such as --$
- probably makes sense (sorry, I'd lost track of all aspects of this lengthy, multi-issue debate).
Different symbols would also serve as visually conspicuous reminders that non-portable platform-specific behavior is being invoked.
That leaves the question what PowerShell should do when it comes across --%
on Unix and --$
on Windows.
I'm fine leaving --%
as-is. Introducing something like --$
which calls out to /bin/sh and I guess cmd.exe on Windows may be a good way to solve this.
No chance of creating a cmdlet for these behaviors?
@iSazonov are you suggesting something like Invoke-Native
? Not sure I'm a fan of that.
Steps to reproduce
native.exe
which acquires ARGVnative.exe "`"a`""
Expected behavior
ARGV[1] ==
"a"
Actual behavior
ARGV[1] ==
a
Environment data
Windows 10 x64