Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.86k stars 528 forks source link

Win32 command line parsing explained #326

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1151 (status was 'resolved')

Searchable as RT1151$

p5pRT commented 24 years ago

From JB@Danware.dk

Created by jb@danware.dk

The README.win32 file in the perl distribution seems to indicate that the porter does not fully understand the rules of command line parsing on this platform\, so here is my explanation for your benefit.

The Win32 command shells *do not* parse the command line into arguments. With two exceptions noted below\, the command line is passed unedited and unparsed as a single NULL terminated string to perl.exe (or any other program). Quotes\, spaces\, wildcards etc. are simply forwarded to the application without any checking or enforcement of rules such as quote balancing.

This string is returned from the GetCommandLine(void) system call.

Thus the notion of UNIX-like argc/argv passed to main() is an effect of the C runtime library and may be overridden by simply providing your own implementation. If you can squeeze a call to your own implementation into perl's main()\, you don't even have to figure out how to get rid of the C compilers parser\, just ignore the results and provide your own.

Also note\, that most C compilers do come with source code for their command line parser.

The two exceptions are​:   1. Redirection​: The command shell parses for redirections and pipes and deletes the associated characters from the command line.   2. Choice of executable​: The command shell interprets the first (possibly quoted) argument as the program to be located in the path\, but does not delete it from the command line.

Neither of these two exceptions apply to the CreateProcess system call.

Hope this helps you make an even better Win32 port of perl.

Perl Info ``` Site configuration information for perl 5.00503: Summary of my perl5 (5.0 patchlevel 5 subversion 03) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86 uname='' hint=recommended, useposix=true, d_sigaction=undef usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cl.exe', optimize='-Od -MD -DNDEBUG', gccversion= cppflags='-DWIN32' ccflags ='-Od -MD -DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT ' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 alignbytes=8, usemymalloc=n, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -release -machine:x86' libpth=\lib libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl.lib Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -release -machine:x86' Locally applied patches: @INC for perl 5.00503: D:\BAT\perl\lib/MSWin32-x86 D:\BAT\perl\lib D:\BAT\perl\site\lib . Environment for perl 5.00503: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=D:\BAT\perl\bin\MSWin32-x86;D:\BAT\PERL\bin;F:\NETOP.SRC\SHARE;D:\B AT;D:\XDT\OFFICE\OFFICE;C:\WINNT40;C:\WINNT40\SYSTEM32;e:\DT\MSDEV\BIN;e :\DT\MSDEV\BIN\WINNT PERL_BADLANG (unset) SHELL (unset) DANWARE HAS MOVED!, NEW ADDRESS/PHONE BELOW Jakob Bøhm Jensen e-mail:jb@danware.dk M.Sc.Eng. http://www.danware.com Danware Data A/S phone: +45 45 90 25 25 Kongevejen 62 fax: +45 45 90 25 26 DK-3460 Birkerod Information in this e-mail does not constitute a binding commitment on behalf of me or Danware Data A/S. ```
p5pRT commented 23 years ago

From @vanstyn

While trying to clear out some ancient cruft from the bug database\, I found this message from Jacob Bo/hm​:   http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.html

I have no idea whether his assertions about win32 command-line parsing are correct\, but the information does not appear to have changed in 5.6.0 (looking at the paragraph that starts 'The crucial thing to understand about the "cmd" shell'). Can someone that knows clarify?

Hugo

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

I have no idea whether his assertions about win32 command-line parsing are correct

They look mostly correct to me\, but I've never tried to figure out what happens to %environment% references\, and there are still rules that cmd.exe uses for things like quoting and escaping because it has to parse the line to find the redirection characters (thus leaving exciting areas for the application to parse things differently than the shell :-).

p5pRT commented 23 years ago

From @gsar

On Thu\, 13 Jul 2000 17​:13​:39 BST\, Hugo wrote​:

While trying to clear out some ancient cruft from the bug database\, I found this message from Jacob Bo/hm​: http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.html

I have no idea whether his assertions about win32 command-line parsing are correct\,

They're correct\, but he's talking about it from the POV of the low-level Win32 API. Perl uses the CRT's argv/argc abstraction that is built (precariously) on top that low level support\, so describing the lower level is not much good to the end user.

but the information does not appear to have changed in 5.6.0 (looking at the paragraph that starts 'The crucial thing to understand about the "cmd" shell'). Can someone that knows clarify?

I don't see anything that needs changing there.

However\, we do need to fix various amounts of brokenness that prevent system(@​args) from working right on windows. Search archives for "spawnvp" if you're interested in pursuing this.

Sarathy gsar@​ActiveState.com

p5pRT commented 23 years ago

From @vanstyn

In \200007131720\.KAA12643@​molotok\.activestate\.com\, Gurusamy Sarathy writes​: :On Thu\, 13 Jul 2000 17​:13​:39 BST\, Hugo wrote​: :>While trying to clear out some ancient cruft from the bug database\, :>I found this message from Jacob Bo/hm​: :> http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.htm :l :> :>I have no idea whether his assertions about win32 command-line parsing :>are correct\, : :They're correct\, but he's talking about it from the POV of the low-level :Win32 API. Perl uses the CRT's argv/argc abstraction that is built :(precariously) on top that low level support\, so describing the lower :level is not much good to the end user. : :>but the information does not appear to have changed in :>5.6.0 (looking at the paragraph that starts 'The crucial thing to :>understand about the "cmd" shell'). Can someone that knows clarify? : :I don't see anything that needs changing there.

Ok\, I'll mark the bugid closed.

Hugo

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

From​: Gurusamy Sarathy [mailto​:gsar@​ActiveState.com]

They're correct\, but he's talking about it from the POV of the low-level Win32 API. Perl uses the CRT's argv/argc abstraction that is built (precariously) on top that low level support\, so describing the lower level is not much good to the end user.

but the information does not appear to have changed in 5.6.0 (looking at the paragraph that starts 'The crucial thing to understand about the "cmd" shell'). Can someone that knows clarify?

I don't see anything that needs changing there.

I could argue that the user needs to know that there are two forces at work - the command shell (which may be CMD\, but may also be something like 4NT\, or even something obscure like zsh) and the CRT argc/argv implementation that Perl uses.

A particular gotcha with the 4NT shell is that if you fail to double % characters\, even when quoted\, they get eaten by the environment variable expansion process. This is not a problem with CMD.

I don't think it's README.win32's place to document the idiosyncracies of all the various Windows shells which could be used (any more than we do for Unix)\, but it probably *is* worth explaining that the shell and the CRT are both involved.

Does the attached patch explain things any better?

[BTW\, note that I changed the comment about quoting redirection characters - based on my experiments\, it *does* work\, at least in CMD.EXE]

Paul.

---- Patch for perlwin32.pod ----

Inline Patch ```diff --- perlwin32.pod.orig Thu Jun 29 10:48:02 2000 +++ perlwin32.pod Fri Jul 14 10:43:30 2000 @@ -283,29 +283,38 @@ shells found in UNIX environments, you will be less than pleased with what Windows offers by way of a command shell. -The crucial thing to understand about the "cmd" shell (which is -the default on Windows NT) is that it does not do any wildcard -expansions of command-line arguments (so wildcards need not be -quoted). It also provides only rudimentary quoting. The only -(useful) quote character is the double quote ("). It can be used to -protect spaces in arguments and other special characters. The -Windows NT documentation has almost no description of how the -quoting rules are implemented, but here are some general observations -based on experiments: The shell breaks arguments at spaces and -passes them to programs in argc/argv. Doublequotes can be used -to prevent arguments with spaces in them from being split up. -You can put a double quote in an argument by escaping it with -a backslash and enclosing the whole argument within double quotes. -The backslash and the pair of double quotes surrounding the -argument will be stripped by the shell. - -The file redirection characters "<", ">", and "|" cannot be quoted -by double quotes (there are probably more such). Single quotes -will protect those three file redirection characters, but the -single quotes don't get stripped by the shell (just to make this -type of quoting completely useless). The caret "^" has also -been observed to behave as a quoting character (and doesn't get -stripped by the shell also). +The crucial thing to understand about the Windows environment is that +the command line you type in is processed twice before Perl sees it. +First, your command shell (usually CMD.EXE on Windows NT, and +COMMAND.COM on Windows 9x) preprocesses the command line, to handle +redirection, environment variable expansion, and location of the +executable to run. Then, the perl executable splits the remaining +command line into individual arguments, using the C runtime library +upon which Perl was built. + +It is particularly important to note that neither the shell nor the C +runtime do any wildcard expansions of command-line arguments (so +wildcards need not be quoted). Also, the quoting behaviours of the +shell and the C runtime are rudimentary at best (and may, if you are +using a non-standard shell, be inconsistent). The only (useful) quote +character is the double quote ("). It can be used to protect spaces in +arguments and other special characters. The Windows NT documentation +has almost no description of how the quoting rules are implemented, but +here are some general observations based on experiments: The C runtime +breaks arguments at spaces and passes them to programs in argc/argv. +Doublequotes can be used to prevent arguments with spaces in them from +being split up. You can put a double quote in an argument by escaping +it with a backslash and enclosing the whole argument within double +quotes. The backslash and the pair of double quotes surrounding the +argument will be stripped by the C runtime. + +The file redirection characters "<", ">", and "|" can be quoted by +double quotes (although there are suggestions that this may not always +be true). Single quotes are not treated as quotes by the shell or the C +runtime. The caret "^" has also been observed to behave as a quoting +character, but this appears to be a shell feature, and the caret is not +stripped from the command line, so Perl still sees it (and the C runtime +phase does not treat the caret as a quote character). Here are some examples of usage of the "cmd" shell: @@ -344,6 +353,13 @@ Discovering the usefulness of the "command.com" shell on Windows 9x is left as an exercise to the reader :) + +One particularly pernicious problem with the 4NT command shell for +Windows NT is that it (nearly) always treats a % character as indicating +that environment variable expansion is needed. Under this shell, it is +therefore important to always double any % characters which you want +Perl to see (for example, for hash variables), even when they are +quoted. =item Building Extensions ```
p5pRT commented 23 years ago

From @jhi

Does the attached patch explain things any better?

[BTW\, note that I changed the comment about quoting redirection characters - based on my experiments\, it *does* work\, at least in CMD.EXE]

Applied\, thanks.