denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
94.63k stars 5.25k forks source link

Give access to raw command line arguments on Windows #9871

Open rivy opened 3 years ago

rivy commented 3 years ago

For Windows, CLI applications need access to the original command line to supply basic services that the Windows shells (CMD/PowerShell) do not, such as wildcard/glob expansion. For example:

In *nix (bash-shell):

$ # POSIX with bash shell
$ ls -l *.mkd
-rwxrwxrwx 1 toor toor 1.2K Jul 28  2020 kb-ToDO.mkd*
-rwxrwxrwx 1 toor toor 1.1K Jul 19  2020 kb-info.mkd*

$ deno eval "console.log(Deno.args)" "*.mkd" *.mkd
[ "*.mkd", "kb-ToDO.mkd", "kb-info.mkd" ]

The Windows versions:

>:: CMD shell
>dir /b *.mkd
kb-info.mkd
kb-ToDO.mkd

>deno eval -T "console.log(Deno.args)" "*.mkd" *.mkd
[ "*.mkd", "*.mkd" ]

>:: PowerShell
>powershell
...

PS> deno eval "console.log(Deno.args)" "*.mkd" *.mkd
[ "*.mkd", "*.mkd" ]

You'll note that the information required to perform reasonable wildcard/glob expansion is removed in the Deno.args array. Specifically here, the quotes are removed, so there is no application-detectable difference between the two arguments. For *nix/bash, it doesn't matter as the shell does the expansion for the application and supplies those tailored arguments, but Windows applications are expected to do that work themselves. The only way to do that is to have access to the original, unparsed command line (such as via GetCommandLineW).

rivy commented 3 years ago

From reading other posts (thread #3892), I gather that the project doesn't currently want to supply access to OS specific APIs. And I don't know if it's relevant to any other platforms, but really only adds potential parity with POSIX systems for Windows applications.

This capability could very simply be added as something like Deno.commandLine, which would be undefined for platforms which don't supply it (currently, only non-Windows). It's really the only way to correctly add wildcard/globbing to Windows CLI applications.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rivy commented 3 years ago

Not stale.

lucacasonato commented 3 years ago

@rivy What is the Node / Go behaviour here?

cc @piscisaureus

piscisaureus commented 3 years ago

I don’t think processes have access to the original command line (except cmd on windows, but cmd doesn’t do shell globbing)

piscisaureus commented 3 years ago

Re-reading this, seems that @rivy's feature request is implied here:

Specifically here, the quotes are removed, so there is no application-detectable difference between the two arguments

On unix, the quotes are interpreted by the shell; the application won't have access to it. On windows, it is technically possible to use GetCommandLineW to "see" the quotes. However you really shouldn't! No applications do this except for some dos legacy command line tools and cmd.exe itself.

So in order to protect yourself and the rest of the world, Deno won't provide access to the raw command line ;)

PS: the thing to do on windows is to always expand glob patterns, quoted or not. * and ? can never appear in valid file names.

rivy commented 3 years ago

@lucacasonato , go has direct access via sys/GetCommandLine(...). NodeJS seems to use a similar process to Deno's recipe, but has access to GetCommandLine() via ffi (an admittedly crufty method of access).

@piscisaureus , TLDR; this is needed for reasonable cross-platform support and I've prototyped a full-featured alternate solution to show it's utility.

... long post ...

Although what you said about *nix/POSIX is generally true, the rest is just false.

First, for nix/POSIX, I know that the command line is not even available, hence my suggestion that Deno.commandLine() be (or return) undefined for those platforms. nix/POSIX platforms already have strong shell support and attempting to reparse/re-expand the command line is not a fruitful endeavor.

Windows, however, has a long tradition of minimal shell support, forcing applications to go their own route to correctly parse and expand the command line. Most applications, when they do a reasonable and reliable job of it, reparse the command line from GetCommandLineA/W. And there a digital ton of modern command line utilities which reparse the command line just to support correct basic globbing (see most rust utilities, [eg, bat, coreutils, ... (many more) ...] which use wild [all of which I've made significant contributions to...; I'm not naive to this problem space]).

Given that you ;) emoji'ed me, I'll take the next comment in good humor. But, all joking aside, you are not protecting me/world by not providing access to the raw command line on platforms which don't have strong shell parsing and globbing. That "protection", in fact, just makes it more difficult for developers to create utilities which are strongly cross-platform.

And, addressing your PS aside... No, you shouldn't always expand glob patterns on the Windows platform. It's a more nuanced problem. For example, just the simple command string deno run -A echo.ts "*bold*" will never be correctly/reliably interpreted on a Windows platform if you simply use the arguments as parsed by CMD (and currently Deno) and use blind glob expansion. The output will depend on the file-system context, which is crazy and un-fixable without having information about the raw text.

Deno seems to have made strong efforts for cross-platform portability and to support Windows/CMD. This step (or some other variation) can help make that effort much more robust and work toward a goal of "a command line execution string means the same thing regardless of platform". I can make big strides towards parity between the *nix/POSIX and Windows platforms when I'm able to access that initial Windows command line.

As proof-of-concept, while waiting on replies to this, #9873, and #9874, I've gone ahead and prototyped a process library which works with enhanced script runners and/or an enhanced shim to supply improved quoting and bash-like command line expansion for Windows platforms. I leveraged the well-regarded braces and picomatch NPMjs libraries to supply full feature brace expansion, fully implemented advanced glob expansion (notably with with path-separator independence), sane double and single quoting, and support for ANSI-C quoted strings (eg, $'\n'). The prototype is somewhat raw and almost certainly has some rough corner cases, but it's already very capable.

This allows for the exact same command line expressions on Windows or nix (bash/POSIX-compatible shell) platforms; for example, when installed with the enhanced shim, `dxr SCRIPT_URL 'single-quoted argument' "double-quoted argument" ../{.,}?([a-m]-)[n-z].globs $'endsWithNewlineThenExt\n.ext'` will expand to the same set of arguments on both Windows and *nix/POSIX platforms.

See dxr and dxi from the dxx repo.

deno install -Af https://deno.land/x/dxx@v0.0.2/src/dxi.ts
dxi -Af https://deno.land/x/dxx@v0.0.2/src/dxr.ts
dxr https://deno.land/x/dxx@v0.0.2/eg/args.ts --debug --lines "*" {,.}* $'\e[31mANSI-C string\e[m' 'single quotes'

I also have some ideas about adding in command line variable expansion (environment variables and sub-shell expansions) but that will entail a more complicated parsing step and some development time.

And I plan to back-port all of the bash-like parsing/expansion to wild whenever I get the time (or can convince @kornelski to do it for me :smile:).

P.S. The enhanced shim fixes the "Terminate batch job (Y/N)?" issue as well.

piscisaureus commented 3 years ago

That "protection", in fact, just makes it more difficult for developers to create utilities which are strongly cross-platform.

Sorry, I'm not buying this at all. What you're trying to do is not useful; I'm sure that with access to the raw command like you can write a program that distinguishes between "*foo*" and *foo* but you won't be able to actually invoke your program with these arguments and maintain the distinction, except if you're calling it directly from the cmd shell. But try invoking it from powershell, or python, or node, or java, or cygwin/mingw/wsl bash. In all of these environments it will be be either super difficult or outright impossible to pass arguments like that to your program.

In the meantime, all windows software treats "*foo*" and *foo* as equivalent, because that's been the convention for the past 30 years. git add "*.md" does the same as git add *.md. dir "c:\*" and dir c:\* and dir "c:"\* and dir "c:""\*" all do exactly the same thing.

Single quotes (') are also a losing proposition: yourprogram.exe 'hello^fun"characters' will be mogrified; whether you or I agree with it, and whether we could've designed a system that deals with it properly (sure!) really doesn't matter.

rivy commented 3 years ago

This seems to be devolving from discussion, but I'll address your points...

Sorry, I'm not buying this at all. What you're trying to do is not useful; I'm sure that with access to the raw command like you can write a program that distinguishes between "*foo*" and *foo* but you won't be able to actually invoke your program with these arguments and maintain the distinction, except if you're calling it directly from the cmd shell. But try invoking it from powershell, or python, or node, or java, or cygwin/mingw/wsl bash. In all of these environments it will be be either super difficult or outright impossible to pass arguments like that to your program.

Simply, and provably, not true.

deno install -Af https://deno.land/x/dxx@v0.0.3/src/dxi.ts
dxi -Af https://deno.land/x/dxx@v0.0.3/eg/args.ts

args '*' *
node -e "const {exec} = require('child_process'); exec('args \'*\' *', (e, out, err) => console.log(out));"
perl -e "system(q{args '*' *})"
powershell -c args --% '*' *
python -c "import subprocess; subprocess.run('args \'*\' *', shell=True)"
C:> wsl bash --login
$ deno -V
deno 1.8.1
$ deno run -A https://deno.land/x/dxx@v0.0.2/eg/args.ts '*' *
Download https://deno.land/x/dxx@v0.0.2/eg/args.ts
Check https://deno.land/x/dxx@v0.0.2/eg/args.ts
* CHANGELOG.mkd LICENSE README.md eg src tests tools tsconfig.json

All of these invocations are parsed correctly and have the same output.

And I think the utility is obvious. Command line tools which are called and work the same between platforms? And how many times have you wanted to pass some unusual string construction to a Windows command tool? This makes most constructions simple, even passing control characters.

I haven't had the time to test MSYS or Cygwin (mostly just to install deno), but I believe they should function in the same manner as wsl/bash. I'm sure there will be corner cases and minor caveats, but this method will operate correctly for the vast majority of use cases.

In the meantime, all windows software treats "*foo*" and *foo* as equivalent, because that's been the convention for the past 30 years. git add "*.md" does the same as git add *.md. dir "c:\*" and dir c:\* and dir "c:"\* and dir "c:""\*" all do exactly the same thing.

This is hyperbole; "all windows software ..." is untrue. I've given specific counter examples.

Single quotes (') are also a losing proposition: yourprogram.exe 'hello^fun"characters' will be mogrified; whether you or I agree with it, and whether we could've designed a system that deals with it properly (sure!) really doesn't matter.

No, single quotes are not an issue. But this is partially true in ways unrelated to single quotes. Assuming this is run from the CMD shell, the only thing unretrievably 'mogrified' in your example is the ^ character. And, yes, there are always shell-based caveats and problematic characters, as exemplified by the myriad shellQuote functions that protect text when sending it on to a specific shell. Certain constructions are always going to be more portable than others between shells. Here, args 'hello'$'\x5e''fun"characters' would be a portable version of that construction (and will work when invoked by CMD, wsl/bash, node, perl, ...).

This can be done, as I have here, without Deno support. But it involves more code contortions, especially for sub-processes. I believe it would be simpler if Deno could just provide the raw text that it parses for Deno.args. There's nothing nefarious or dangerous. Providing the line text just makes things simpler for users and can lead to much improved portability.

piscisaureus commented 3 years ago

So I did a quick test: [getcmd.c source code]

cmd.exe
~~~~~~~
C:\>d:\getcmd\getcmd '*' "*" * END
d:\getcmd\getcmd  '*' "*" * END

powershell
~~~~~~~~~~
PS C:\> d:\getcmd\getcmd '*' "*" * END
"d:\getcmd\getcmd.exe" * * * END

bash (wsl)
~~~~~~~~~~
piscisaureus@guru:/mnt/c$ ~/d/getcmd/getcmd.exe '*' "*" * END
getcmd.exe * * $RECYCLE.BIN $WinREAgent "Documents and Settings" Intel LocalStorage PerfLogs "Program Files" "Program Files (x86)" ProgramData Recovery "System Volume Information" Users Windows hiberfil.sys pagefile.sys swapfile.sys END                                                          

So the actual command line that getcmd.exe sees is different each time. Unsurprising of course: these three different shells (all commonly used) each have their own rules for quoting and escaping. I can't imagine how getcmd.exe would be able to recover the original "intent", as it doesn't know which shell(?) cooked its command line.

It seems that msys/ does something similar, and people love it: https://github.com/msys2/msys2-runtime/issues/36 https://github.com/msys2/MSYS2-packages/issues/522 https://github.com/git-for-windows/git/issues/1220 https://github.com/curl/curl/pull/1813 https://github.com/magit/magit/issues/2246 https://github.com/magit/magit/issues/2246 https://github.com/magit/magit/issues/2711 https://github.com/git-for-windows/git/issues/561 https://github.com/git-for-windows/git/issues/1019

rivy commented 3 years ago

So, PowerShell is currently a special child going through many growing pains, one of which is command line argument passing (see https://github.com/PowerShell/PowerShell/issues/13089 and https://github.com/PowerShell/PowerShell/issues/15143). That's one of the reasons why, to a large extent, most applications will use cmd as the shell to run sub-processes.

But, generally, using the --% (as I did above...) will stop argument handling and leave it to the executable. So, I'd document that... if you're using PowerShell and want portable behavior, use something like getcmd.exe --% '*' "*" * END. Additionally, the standard CMD environment variable COMSPEC is cleared under PowerShell, so that could be used to signal prior command line processing.

For nix/POSIX shells, you should be using an executable built for that system (which would then know to leave the command line alone), eg, use nix/POSIX executables under wsl/bash. That's what the rust executables that I mentioned are designed to do. And that's why the command deno run -A https://deno.land/x/dxx@v0.0.2/eg/args.ts '*' * works correctly. It's using the deno for ubuntu/bash which tells the script that it's not 'windows' (ie, Deno.build.os !== 'windows'), so the script leaves the args as pass-throughs. That's why I recommended that deno should pass undefined as a value for Deno.commandLine to scripts running under non-Windows platforms with strong shell support, indicating that the command line is pre-processed by the shell into Deno.args.

What does Deno.build.os return for MSYS/Cygwin? Is there a platform-specific build for them or do you recommend installing a Windows-executable? Even if it does return "windows", command line processing could be bypassed based on a signal that the shell is more capable (such as the SHELL environment variable, which is usually /bin/bash or /usr/bin/bash). Though I'm not sure that planning for executing "through" another platform would really be common enough to make bullet-proof fallbacks.

To be clear, I'm not asking for Deno to process the command line, just asking for it to be supplied so that a script can do as all current regular Windows executables can do ... process the raw command line if they want to (preferably through a well tested library).

The prototype just shows that a lot can be done to make the scripts more useful and flexible at the command line for users.

Wouldn't you like to be able to use bash-like expansion, globs, etc from the Windows command line? I, personally, frequently miss it when I switch from bash back to CMD.

rivy commented 3 years ago

Talking beyond the ask here, but, after thinking about your points, I was able to test my prototype further under MSYS and WSL.

Based on that testing and the discussion, I made some modifications to add shell detection (via SHELL) in addition to using Deno.build.os.

args now works correctly when used under CMD, WSL/bash, or MSYS (which calls the Win32 deno.exe as a passthru). (No Cygwin currently installed to test upon.)

deno install -Af https://deno.land/x/dxx@v0.0.4/src/dxi.ts
dxi -Af https://deno.land/x/dxx@v0.0.4/eg/args.ts
args --debug '*' "*" * END

Direct execution of the bash/sh shell shim script works as long as deno is installed in WSL. And WSL (passthru) execution of deno.exe ... works if you push the shell variable out to the Win32 process on invocation:

WSLENV=$WSLENV:SHELL/w deno.exe run -A https://deno.land/x/dxx@v0.0.4/eg/args.ts '*' "*" * END

I'm not sure how to (or if there's a way) to detect the passthru execution of deno.exe without that environment variable signal, but the "normal", *nix-specific WSL/bash execution works just fine.

piscisaureus commented 3 years ago

https://github.com/rust-lang/rust/blob/7f9ab0300cd66f6f616e03ea90b2d71af474bf28/library/std/src/os/windows/process.rs#L113-L127

piscisaureus commented 3 years ago

See also: https://github.com/rust-lang/rust/blob/7f9ab0300cd66f6f616e03ea90b2d71af474bf28/library/std/src/os/windows/process.rs#L113-L127

rivy commented 3 years ago

See also: https://github.com/rust-lang/rust/blob/7f9ab0300cd66f6f616e03ea90b2d71af474bf28/library/std/src/os/windows/process.rs#L113-L127

I'm not sure what you're trying to say here... But if you're implying that rust only supports blindly quoting arguments for process execution, that's no longer the case (see https://github.com/rust-lang/rust/blob/955b9c0d4cd9176b53f518e01cbe175545c69947/library/std/src/os/windows/process.rs#L130-L136). The discussion about the problem and need for change is at https://github.com/rust-lang/rust/issues/29494 and the commit adding the change is at https://github.com/rust-lang/rust/commit/d868da7796bfb96e87e09afc4e8338911b5f99b3.

I'm not sure that the referenced code (issue, discussion, commit) is directly relevant, but it does show that there is significant nuance and complexity to the problem of reading, creating, and using command line arguments for Windows.

lucacasonato commented 3 years ago

Duplicate of #8852?

rivy commented 3 years ago

Duplicate of #8852?

No, not really. #8852 is about generating command lines for execution which is really the reverse of this problem (accessing the original command line).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rivy commented 2 years ago

Not stale.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rivy commented 2 years ago

Not stale.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rivy commented 2 years ago

Not stale.

FKPSC commented 2 years ago

I spent a long time trying to get a command that looks like start "app\\Cool app.exe" to run through Deno.run. It seems to potentially be related to this?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rivy commented 2 years ago

Not stale.

Artoria2e5 commented 7 months ago

@piscisaureus In the meantime, all windows software treats "foo" and foo as equivalent, because that's been the convention for the past 30 years.

This is not true, if you check the David Deley’s closest-to-authoritative document on Windows command-line parsing: https://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES. Specifically:

Deley does not address globbing, but I recommend reading the part quoted in https://github.com/remkop/picocli/issues/1761#issuecomment-1200555006. There used to be a trove of Java complaints around the time they updated their MS C++ runtime & switched the setting. That was less than 30 years ago.

@rivy This allows for the exact same command line expressions on Windows or nix (bash/POSIX-compatible shell) platforms; for example, when installed with the enhanced shim, dxr SCRIPT_URL 'single-quoted argument' "double-quoted argument" ../{.,}?([a-m]-)[n-z].globs $'endsWithNewlineThenExt\n.ext' will expand to the same set of arguments on both Windows and *nix/POSIX platforms.

This is also not totally wise. Among Windows programs, very few things give you access to the raw command-line as the main interface. Other things, from .NET Arguments, Python spawn, Rust (whatever it's called), and Cygwin's wrapped stuff, all present an interface of an argument-array (argv) to be used; they have to do something to quote them into a command-line. Assuming nothing goes wrong, they use a quoting method that works for the majority of applications, one that gets reversed as-is by the MS C++ runtime into argv.

The point is: whatever extensions you bring in, it MUST NOT break dquoted msvcrt-style arguments. You still have the liberty of messing with unquoted things.