abhilekhsingh / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

kill() broken in the shellcmd backend #458

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The `kill()` method in the ShellCmd backend is currently broken.  As
the following log snippet shows, it only tries to kill the single
process that it spawned (i.e., `/usr/bin/time ...`), while leaving all
of its children up and running::

        gc3.gc3libs: DEBUG: Connecting to host '130.60.24.184' as user 'gc3-user' via SSH (timeout 7s)...
        gc3.gc3libs: DEBUG: SshTransport running `kill 7232`...
        gc3.gc3libs: DEBUG: Executed command 'kill 7232' on host '130.60.24.184'; exit code: 0
        gc3.gc3libs: DEBUG: Deleting resource file for pid 7232

However, the process tree looks rather like this::

        gc3-user 15528  0.0  0.0   4300   348 ?        S    Jul09   0:00 /usr/bin/time -o /tmp/gc3libs.bqWIO5/.gc3pie_shellcmd/resource_usage.txt -f WallTime=%es?KernelTime=%Ss?UserTime=%Us?CPUUsage=%P?MaxResidentMemory=%MkB?AverageResidentMemory=%tkB?AverageTotalMemory=%KkB?AverageUnsharedMemory=%DkB?AverageUnsharedStack=%pkB?AverageSharedMemory=%XkB?PageSize=%ZB?MajorPageFaults=%F?MinorPageFaults=%R?Swaps=%W?ForcedSwitches=%c?WaitSwitches=%w?Inputs=%I?Outputs=%O?SocketReceived=%r?SocketSent=%s?Signals=%k?ReturnCode=%x /bin/sh -c  "./gcelljunction_wrapper.sh" "-x" "./tricellular_junctions_new" "--" "38"
        gc3-user 15529  0.0  0.0   4404   584 ?        S    Jul09   0:00  \_ /bin/sh -c  "./gcelljunction_wrapper.sh" "-x" "./tricellular_junctions_new" "--" "38"
        gc3-user 15530  0.0  0.0  11044  1412 ?        S    Jul09   0:00      \_ /bin/bash ./gcelljunction_wrapper.sh -x ./tricellular_junctions_new -- 38
        gc3-user 15535 53.7  2.1 1434392 175608 ?      Sl   Jul09 30838:45          \_ ./tricellular_junctions_new 38

Now this has also a very bad unintended consequence: since GC3Pie
deletes the resource file, the VM slot is then marked as free to run
other jobs, which it isn't  -- this is how we ended up having two
processes running in Tinri's VMs.

A good way to do the killing on Linux should be to kill all processes
that run in the same temporary directory::

    ec2ssh -n "gc3-user@$ip" "find /proc/*/cwd -maxdepth 0 -printf '%p %l\n' 2>/dev/null | egrep '${cwd}\$'" 2>/dev/null | \
        (while read proc wd; do
           pid=$(echo "$proc" | cut -d/ -f3)
           #echo "DEBUG: pid=$pid"
           #ec2ssh -n "gc3-user@$ip" "ps --no-headers u '$pid'" 2>/dev/null
           ec2ssh -n "gc3-user@$ip" "kill $pid" 2>/dev/null
        done)

I am not sure if the MacOSX `/proc` provides the same information,
though.

Original issue reported on code.google.com by riccardo.murri@gmail.com on 18 Aug 2014 at 12:48

GoogleCodeExporter commented 9 years ago
Apparently, there is no `/proc` on MacOS X.  StackOverflow thread
http://stackoverflow.com/a/8331292/459543 suggests that the better
thing to do is parse the output of `lsof -d cwd`.

Original comment by riccardo.murri@gmail.com on 25 Jun 2015 at 8:27