Closed lumpfish closed 3 years ago
At some point code was added to https://github.com/AdoptOpenJDK/stf/blob/master/stf.core/scripts/stf/Commands.pm to kill Windows processes using pskill. Later the code was modified to use Stop-Process rather than pskill if it is available (#11).
What was the original reasons for adding pskill and Stop-Process? Did we not always have the terminate() function which uses the Win32 API directly in our disposal?
Just trying to understand if going back to using terminate() function would have any side effects.
Since we know why Stop-Process was added the question is really 'why was pskill added?'. I don't know the answer to that, except to surmise that maybe the writer of Commands.pm did not realise that the terminate() function was also implemented for Windows (Commands.pm was added after the ProcessMgmt library). I have created https://github.com/AdoptOpenJDK/stf/pull/94 with Stop-Process / pskill removed and I'm currently running sanity.system across all releases to test. @Mesbah-Alam - please feel free to run any local testing too. Remember the change only affects Windows.
Just trying to understand if going back to using terminate() function would have any side effects.
That's the motivation for running the sanity.system tests across all the releases. We do know that using Stop-Process has the side effect reported by https://github.com/eclipse/openj9/issues/11021.
Test runs of sanity,system using https://github.com/AdoptOpenJDK/stf/pull/94 (all processes killed successfully where necessary): sanity.system
jdk8 hotspot: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5057/console win_x86-32: https://ci.adoptopenjdk.net/job/Grinder/5056/console openj9: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5052/console win_x86-64_xl: https://ci.adoptopenjdk.net/job/Grinder/5053/console win_x86-32: https://ci.adoptopenjdk.net/job/Grinder/5054/console
jdk11 hotspot: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5059/console win_x86-32: https://ci.adoptopenjdk.net/job/Grinder/5060/console openj9: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5062/console win_x86-64_xl: https://ci.adoptopenjdk.net/job/Grinder/5063/console
jdk15 hotspot: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5072/console win_x86-32: https://ci.adoptopenjdk.net/job/Grinder/5073/console openj9: win_x86-64: https://ci.adoptopenjdk.net/job/Grinder/5069/console win_x86-64_xl: https://ci.adoptopenjdk.net/job/Grinder/5070/console
This occurred on an AdoptOpenJDK build yesterday: https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.system_x86-64_windows_xl/408/consoleFull, so it's affecting tests at AdoptOpenJDK as well as Eclipse/openj9:
14:59:04 STF 14:59:02.697 - Killing processes: LT1
14:59:05 STF 14:59:04.459 - Using powershell to kill LT1
14:59:05 STF 14:59:04.459 - Running command: cmd /c echo | powershell Stop-Process 40052
14:59:05 STF 14:59:04.459 - Redirecting stderr to C:/Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_x86-64_windows_xl/openjdk-tests/TKG/test_output_1607350846523/TestJlmRemoteMemoryNoAuth_0/20201207-145214-TestJlmRemoteMemoryNoAuth/results\kill_powershell.stderr
14:59:05 STF 14:59:04.460 - Redirecting stdout to C:/Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_x86-64_windows_xl/openjdk-tests/TKG/test_output_1607350846523/TestJlmRemoteMemoryNoAuth_0/20201207-145214-TestJlmRemoteMemoryNoAuth/results\kill_powershell.stdout
14:59:07 Could not terminate process PID=40052: Access is denied.
14:59:07 **FAILED** at step 4 (Stop LT1 process). Expected return value=0 Actual=1 at C:/Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_x86-64_windows_xl/openjdk-tests/\TKG\test_output_1607350846523\TestJlmRemoteMemoryNoAuth_0/20201207-145214-TestJlmRemoteMemoryNoAuth/execute.pl line 177.
14:59:07 STF 14:59:06.945 - **FAILED** execute script failed. Expected return value=0 Actual=1
@Mesbah-Alam - are you OK for me to go ahead and merge the PR?
@lumpfish The PR was merged. Thank you!
Tests fail with Could not terminate process PID=nnnn: Access is denied
https://github.com/eclipse/openj9/issues/11021 describes tests which fail because tests fail from time to time with
Could not terminate process PID=nnnn: Access is denied.
.At some point code was added to https://github.com/AdoptOpenJDK/stf/blob/master/stf.core/scripts/stf/Commands.pm to kill Windows processes using pskill. Later the code was modified to use Stop-Process rather than pskill if it is available (https://github.com/AdoptOpenJDK/stf/pull/11). However, in investigating https://github.com/eclipse/openj9/issues/11021 it looks like neither StopProcess nor pskill are necessary, and using them leads to the error reported here.
The stf::ProcessMgnt perl library contains functions to start, stop, kill and poll processes for both Windows and unix based operating systems. https://github.com/AdoptOpenJDK/stf/blob/master/stf.core/scripts/stf/ProcessMgmt/Windows.pm contains a terminate() function which uses the Win32 API directly to terminate a process. In the event that the process killed by the StopProcess code does not terminate within two seconds of the StopProcess command being issued, the terminate() function is called (via the code path which is common for both Windows and unix). It is the Win32::TerminateProcess function which outputs the
Could not terminate process PID=nnnn: Access is denied
message, not the Stop-Process function, presumably because the process is already being terminated and is no longer in a state which can be termianted.This is a failing test with some additional debugging code added:
Case 1:
Case 2:
Case 3:
One way to fix this would be to wait longer for the process to be termianted as a result of the Stop-Process command. But if the Stop-Process / pskill code is removed so that the terminate() function in the ProcessMgmt module is used instead (which is after all the point of the ProcessMgmt library) then the kill_process function works as expected anyway. So that (removing the StopProcess / pskill code) would be the simpler and more elegant solution.