Open p5pRT opened 10 years ago
While investigating why re/subst.t and other subst*.t tests take 300 seconds on Win32 while their watchdog process times out\, (which is because the kill() in the end block is on cmd.exe's PID\, not the watchdog perl's PID\, killing the cmd.exe leaves watchdog perl process around\, which blocks harness script until the timeout at as high as 300 seconds expires\, why harness blocks on the watchdog proc IDK\, and that is for another ticket\, if appropriate) and other watchdog() from test.pl using .ts. TonyC brought up the question of why system() made a cmd.exe process when perl is not a shell builtin and should have been directly launched by the parent perl process (a perl that called watchdog()). I will investigate that in this ticket. This ticket isn't directly about the watchdog() bug\, it is about system()'s behavior. Fixing system() will probably fix the watchdog bug (there are other ways too\, like investigating why harness process hangs on the watchdog\, and switching test.pl to a process group kill() instead of single process kill()\, process group kill on Win32 was broken in 5.17 and is another ticket at https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121230).
For testing my script is ------------------------------------------------------ system( 1\, 'C:\\perl519\\src\\t\\perl.exe "-I../lib" -e "sleep(300);warn qq/# Tes t process timed out - terminating\\n/;kill(KILL\, 12748);" '); ----------------------------------------------------- I am not sure why the "\\" are there even though its single quote. Data::Dumper printed it that way. PID 12748 doesn't exist on my system but it doesn't matter.
The callstack on blead generally looks like -------------------------------------------------------
perl519.dll!do_spawnvp_handles(int mode=1\, const char * cmdname=0x0090c5ec\, const char * const * argv=0x008f5614\, const int * handles=0x00000000) Line 3705 C perl519.dll!win32_spawnvp(int mode=1\, const char * cmdname=0x0090c5ec\, const char * const * argv=0x008f5614) Line 3698 + 0x13 C perl519.dll!Perl_do_aspawn(interpreter * my_perl=0x003645ec\, sv * really=0x00000000\, sv * * mark=0x0036aec8\, sv * * sp=0x0036aec4) Line 644 + 0x5b C perl519.dll!Perl_pp_system(interpreter * my_perl=0x003645ec) Line 4225 + 0x13 C perl519.dll!Perl_runops_debug(interpreter * my_perl=0x003645ec) Line 2420 + 0xd C perl519.dll!S_run_body(interpreter * my_perl=0x003645ec\, long oldscope=1) Line 2446 + 0xd C perl519.dll!perl_run(interpreter * my_perl=0x003645ec) Line 2365 C perl519.dll!RunPerl(int argc=2\, char * * argv=0x00362478\, char * * env=0x003629e0) Line 270 + 0x9 C++ perl.exe!main(int argc=2\, char * * argv=0x00362478\, char * * env=0x00362d58) Line 23 + 0x12 C perl.exe!mainCRTStartup() Line 398 + 0xe C kernel32.dll!_BaseProcessStart@4() + 0x23 --------------------------------------------------------- win32_spawnvp is a wrapper that has no logic of its own nowadays so I will ignore it exists. On the 1st try in do_spawnvp_handles CreateProcess is given "C:\perl519\src\t\perl.exe -I../lib -e sleep(300);warn qq/# Test process timed out - terminating\n/;kill(KILL\, 3824);" (no outer quotes) as cname/LPCTSTR lpApplicationName\, also notice all `"`s were removed from inside the path by perl\, and "C:\perl519\src\t\perl.exe "-I../lib" -e "sleep(300);warn qq/# Test process timed out - terminating\n/;kill(KILL\, 3824);"" (no outer quotes) as cmd/LPTSTR lpCommandLine.
lpApplicationName's MS docs are ------------------------------------------------------ lpApplicationName [in] Pointer to a null-terminated string that specifies the module to execute. The specified module can be a Windows-based application. It can be some other type of module (for example\, MS-DOS or OS/2) if the appropriate subsystem is available on the local computer. The string can specify the full path and file name of the module to execute or it can specify a partial name. In the case of a partial name\, the function uses the current drive and current directory to complete the specification. The function will not use the search path. If the file name does not contain an extension\, .exe is assumed. Therefore\, if the file name extension is .com\, this parameter must include the .com extension. The lpApplicationName parameter can be NULL. In that case\, the module name must be the first white space-delimited token in the lpCommandLine string. If you are using a long file name that contains a space\, use quoted strings to indicate where the file name ends and the arguments begin; otherwise\, the file name is ambiguous. For example\, consider the string "c:\program files\sub dir\program name". This string can be interpreted in a number of ways. The system tries to interpret the possibilities in the following order: c:\program.exe files\sub dir\program name c:\program files\sub.exe dir\program name c:\program files\sub dir\program.exe name c:\program files\sub dir\program name.exe If the executable module is a 16-bit application\, lpApplicationName should be NULL\, and the string pointed to by lpCommandLine should specify the executable module as well as its arguments. To run a batch file\, you must start the command interpreter; set lpApplicationName to cmd.exe and set lpCommandLine to the name of the batch file. ------------------------------------------------------
Obviously c string "C:\perl519\src\t\perl.exe -I../lib -e sleep(300);warn qq/# Test process timed out - terminating\n/;kill(KILL\, 3824);" is not a file. The 1st CreateProcess fails with GLR == 3/ERROR_PATH_NOT_FOUND. This is a reasonable error. Then do_spawnvp_handles() calls qualified_path() is called on this bogus file path. GetFileAttributes is called on file "C:\perl519\src\t\perl.exe -I../lib -e sleep(300);warn qq/# Test process timed out - terminating\n/;kill(KILL\, 3824); .exe"\, notice the new ".exe" at the end. GetFileAttributes fails with GLR == 3/ERROR_PATH_NOT_FOUND. Then GetFileAttributes is called with "C:\perl519\src\t\perl.exe -I../lib -e sleep(300);warn qq/# Test process timed out - terminating\n/;kill(KILL\, 3824);". Another bogus file. qualified_path() fails by returning NULL at this point. Then in do_spawnvp_handles\, the following executes ------------------------------- errno = ENOENT; ret = -1; goto RETVAL; ------------------------ And control returns to Perl_do_aspawn in win32.c\, do_aspawn then adds cmd.exe to argv array and then calls do_spawnvp_handles again with "cmd.exe" as cmdname instead of a bogus command line string. This works. But the child proc is now wrapped in a cmd.exe process. system()'s docs say ---------------------------------------- Does exactly the same thing as exec LIST \, except that a fork is done first and the parent process waits for the child process to exit. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST\, or if LIST is an array with more than one value\, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument\, the argument is checked for shell metacharacters\, and if there are any\, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms\, but varies on other platforms). If there are no shell metacharacters in the argument\, it is split into words and passed directly to execvp \, which is more efficient. ---------------------------------------- The 2 parts that I dont fully understand is âthe argument is checked for shell metacharactersâ and âif there are no shell metacharacters in the argumentâ. We have a complete command line. The file path name might being with a â or not. The file path name might have a space in it\, and use â to group it into a file name. IDK what this code does Unix. There is a Perl_do_exec3 which defines âcheck for shell metacharactersâ but I dont think it is used on Windows. IDK what system() should be doing\, TonyC says it should not be calling cmd.exe/the shell. The docs say 1 arg list to system() is always a shell. Win32 Perl uselessly trys to launch the process without shell currently\, which needs to be stopped. Either Win32 Perl always uses shell/cmd.exe for 1 item system(). Or Perl knows how to pull out the file path name from the string and pass that to CreateProcess. IDK which it should be.
On Thu\, Feb 20\, 2014 at 12:46 AM\, bulk88 \perlbug\-followup@​perl\.org wrote:
The 2 parts that I dont fully understand is "the argument is checked for shell metacharacters" and "if there are no shell metacharacters in the argument".
It means "if we can easily execute the shell command without invoking a shell".
The RT System itself - Status changed from 'new' to 'open'
ideas from IRC
[16:14] \<@xdg> bulk88\, I'd suggest avoiding the quoted single quote\, too
[16:14] \<@xdg> s/wasn't/not/ and see what happnes
[16:15] \
-- bulk88 ~ bulk88 at hotmail.com
bulk88 via RT writes:
[16:17] \<@bulk88> "Does exactly the same thing as exec LIST \, except that a fork is done first and the parent process waits for the child process to exit. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST\, or if LIST is an array with more than one value\, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument\, the argument is checked for shell metacharacters\, and if there are any\, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms\, but varies on other platforms). If there are no shell metacharacters in the argument\, it is split into words and passed directly to execvp \, which is more efficient." [16:18] \<@xdg> bulk88\, as I said\, that's wrong. And even perlport doesn't clarify sufficiently [16:19] \<@xdg> My opinion is that it's probably a bug\, but no one understands windows shell quoting in Perl well enough to diagnose\, fix or document properly. :-(
If there's consensus on system's current behaviour on Windows being wrong\, or if a plan is developed for what it should be doing\, please can you let this IPC::System::Simple ticket know: https://github.com/pjf/ipc-system-simple/issues/9
IPC::System::Simple aims to be compatible with built-in system. That bug report is because sometimes it isn't on Windows â something to do with quoting and arg-splitting.
If built-in system is going to change\, then IPC::System::Simple needs to change to match that.
(Also\, it may be worth looking at IPC::System::Simple's current behaviour\, just in case that happens to be the desired behaviour for built-in system\, though I suspect not.)
Cheers
Smylers -- http://twitter.com/Smylers2
Sorry\, I don't have time to look at this in detail right now\, but wanted to point out that there are some extensive tests for the system() quoting behavior on Windows in t/win32/system_tests.
I would recommend to start by extending these tests first to cover the currently broken cases\, to see how it all fits together before designing a new quoting system from scratch.
Cheers\, -Jan
Sorry if this has already been mentioned. I had a skim read of the above and didn't see it\, so I thought I'd record it here for future reference:
The cmd.exe that unexpectedly gets launched to launch the perl.exe (instead of perl.exe getting launched directly) sometimes doesn't happen when using system("$perl ...") rather than system(1\, "$perl ..."). (The discussion so far has been regarding the latter form\, specifically a case in t/test.pl's watchdog().)
The example I've just stumbled across is that this:
perl -le "system(1\, qq[$^X -e sleep(10)])"
launches a cmd.exe which launches a perl.exe which sleeps for 10 seconds\, which can be clearly seen in the tree view of Process Explorer: the cmd.exe->perl.exe pair are separated from the cmd.exe in which I ran the above command because the system(1\, ...) form doesn't wait for the command to complete so the perl.exe than I ran from my cmd.exe immediately exits\, leaving just the cmd.exe->perl.exe that it launched\, now separated from my cmd.exe. You can see this more clearly by running:
perl -le "system(1\, qq[$^X -e sleep(10)]); sleep(5)"
in which the perl.exe that I ran from my cmd.exe sticks around for 5 seconds with the cmd.exe->perl.exe underneath it before exiting.
Whereas this:
perl -le "system(qq[$^X -e sleep(10)])"
does NOT launch the unexpected cmd.exe\, which can again be clearly seen in Process Explorer: the perl.exe than I ran from my cmd.exe only has another perl.exe underneath it.
I don't know if that's a useful observation or not; I just thought I'd mention it since it surprised me.
Another thing I noticed is that shortly after this ticket appeared\, there was another discussion on p5p (see the thread beginning here: http://www.nntp.perl.org/group/perl.perl5.porters/2014/04/msg214390.html) which wound up in commit http://perl5.git.perl.org/perl.git/commit/94d4006a6d\, which noted that\, "On Windows\, only the system PROGRAM LIST syntax will reliably avoid using the shell; system LIST\, even with more than one element\, will fall back to the shell if the first spawn fails."
That suggests to me that changing
perl -le "system(1\, qq[$^X -e sleep(10)])"
to
perl -le "system({$^X} 1\, qq[$^X -e sleep(10)])"
should workaround the problem.
Indeed\, it does\, but seems to reveal some other new problem?! -- the perl.exe that I ran from my cmd.exe does indeed now run the other perl.exe directly\, without an intermediate cmd.exe\, but the second perl.exe (with the -e sleep(10) arguments) hangs indefinitely instead of exiting after 10 seconds.
I wondered if the indirect object part was causing some confusion with the special leading "1"\, but this:
perl -le "system({$^X} qq[$^X -e sleep(10)])"
(which works around the unexpected cmd.exe problem itself by omitting that leading "1"\, of course\, as noted earlier) also still hangs indefinitely.
In this case\, the system LIST form avoids the unwanted cmd.exe anyway\, i.e. this behaves as expected:
perl -le "system($^X\, '-e'\, 'sleep(10)')"
and with this multi-element LIST form\, the indirect object syntax no longer hangs indefinitely either\, i.e. this also behaves as expected:
perl -le "system({$^X} $^X\, '-e'\, 'sleep(10)')"
In fact\, both of these forms also work as expected with the leading "1" too\, i.e.:
perl -le "system(1\, $^X\, '-e'\, 'sleep(10)')"
perl -le "system({$^X} 1\, $^X\, '-e'\, 'sleep(10)')"
So to summarize all this:
perl -le "system(1\, qq[$^X -e sleep(10)])" uses cmd.exe
perl -le "system(qq[$^X -e sleep(10)])" ok
perl -le "system({$^X} 1\, qq[$^X -e sleep(10)])" no cmd.exe but hangs
perl -le "system({$^X} qq[$^X -e sleep(10)])" no cmd.exe but hangs
perl -le "system(1\, $^X\, '-e'\, 'sleep(10)')" ok
perl -le "system($^X\, '-e'\, 'sleep(10)')" ok
perl -le "system({$^X} 1\, $^X\, '-e'\, 'sleep(10)')" ok
perl -le "system({$^X} $^X\, '-e'\, 'sleep(10)')" ok
Sorry if this has already been mentioned. I had a skim read of the above and didn't see it\, so I thought I'd record it here for future reference:
The cmd.exe that unexpectedly gets launched to launch the perl.exe (instead of perl.exe getting launched directly) sometimes doesn't happen when using system("$perl ...") rather than system(1\, "$perl ..."). (The discussion so far has been regarding the latter form\, specifically a case in t/test.pl's watchdog().)
The example I've just stumbled across is that this:
perl -le "system(1\, qq[$^X -e sleep(10)])"
launches a cmd.exe which launches a perl.exe which sleeps for 10 seconds\, which can be clearly seen in the tree view of Process Explorer: the cmd.exe->perl.exe pair are separated from the cmd.exe in which I ran the above command because the system(1\, ...) form doesn't wait for the command to complete so the perl.exe than I ran from my cmd.exe immediately exits\, leaving just the cmd.exe->perl.exe that it launched\, now separated from my cmd.exe. You can see this more clearly by running:
perl -le "system(1\, qq[$^X -e sleep(10)]); sleep(5)"
in which the perl.exe that I ran from my cmd.exe sticks around for 5 seconds with the cmd.exe->perl.exe underneath it before exiting.
Whereas this:
perl -le "system(qq[$^X -e sleep(10)])"
does NOT launch the unexpected cmd.exe\, which can again be clearly seen in Process Explorer: the perl.exe than I ran from my cmd.exe only has another perl.exe underneath it.
I don't know if that's a useful observation or not; I just thought I'd mention it since it surprised me.
Another thing I noticed is that shortly after this ticket appeared\, there was another discussion on p5p (see the thread beginning here: http://www.nntp.perl.org/group/perl.perl5.porters/2014/04/msg214390.html) which wound up in commit http://perl5.git.perl.org/perl.git/commit/94d4006a6d\, which noted that\, "On Windows\, only the system PROGRAM LIST syntax will reliably avoid using the shell; system LIST\, even with more than one element\, will fall back to the shell if the first spawn fails."
That suggests to me that changing
perl -le "system(1\, qq[$^X -e sleep(10)])"
to
perl -le "system({$^X} 1\, qq[$^X -e sleep(10)])"
should workaround the problem.
Indeed\, it does\, but seems to reveal some other new problem?! -- the perl.exe that I ran from my cmd.exe does indeed now run the other perl.exe directly\, without an intermediate cmd.exe\, but the second perl.exe (with the -e sleep(10) arguments) hangs indefinitely instead of exiting after 10 seconds.
I wondered if the indirect object part was causing some confusion with the special leading "1"\, but this:
perl -le "system({$^X} qq[$^X -e sleep(10)])"
(which works around the unexpected cmd.exe problem itself by omitting that leading "1"\, of course\, as noted earlier) also still hangs indefinitely.
In this case\, the system LIST form avoids the unwanted cmd.exe anyway\, i.e. this behaves as expected:
perl -le "system($^X\, '-e'\, 'sleep(10)')"
and with this multi-element LIST form\, the indirect object syntax no longer hangs indefinitely either\, i.e. this also behaves as expected:
perl -le "system({$^X} $^X\, '-e'\, 'sleep(10)')"
In fact\, both of these forms also work as expected with the leading "1" too\, i.e.:
perl -le "system(1\, $^X\, '-e'\, 'sleep(10)')"
perl -le "system({$^X} 1\, $^X\, '-e'\, 'sleep(10)')"
So to summarize all this:
perl -le "system(1\, qq[$^X -e sleep(10)])" uses cmd.exe
perl -le "system(qq[$^X -e sleep(10)])" ok
perl -le "system({$^X} 1\, qq[$^X -e sleep(10)])" no cmd.exe but hangs
perl -le "system({$^X} qq[$^X -e sleep(10)])" no cmd.exe but hangs
perl -le "system(1\, $^X\, '-e'\, 'sleep(10)')" ok
perl -le "system($^X\, '-e'\, 'sleep(10)')" ok
perl -le "system({$^X} 1\, $^X\, '-e'\, 'sleep(10)')" ok
perl -le "system({$^X} $^X\, '-e'\, 'sleep(10)')" ok
@steve-m-hay @bulk88 this case has been idle for 5 years. I assume nothing has changed on this problem. Can you suggest what we might want to do to move this forward?
Migrated from rt.perl.org#121283 (status was 'open')
Searchable as RT121283$