The `kill()` method in the ShellCmd backend is currently broken. As
the following log snippet shows, it only tries to kill the single
process that it spawned (i.e., `/usr/bin/time ...`), while leaving all
of its children up and running::
gc3.gc3libs: DEBUG: Connecting to host '130.60.24.184' as user 'gc3-user' via SSH (timeout 7s)...
gc3.gc3libs: DEBUG: SshTransport running `kill 7232`...
gc3.gc3libs: DEBUG: Executed command 'kill 7232' on host '130.60.24.184'; exit code: 0
gc3.gc3libs: DEBUG: Deleting resource file for pid 7232
However, the process tree looks rather like this::
gc3-user 15528 0.0 0.0 4300 348 ? S Jul09 0:00 /usr/bin/time -o /tmp/gc3libs.bqWIO5/.gc3pie_shellcmd/resource_usage.txt -f WallTime=%es?KernelTime=%Ss?UserTime=%Us?CPUUsage=%P?MaxResidentMemory=%MkB?AverageResidentMemory=%tkB?AverageTotalMemory=%KkB?AverageUnsharedMemory=%DkB?AverageUnsharedStack=%pkB?AverageSharedMemory=%XkB?PageSize=%ZB?MajorPageFaults=%F?MinorPageFaults=%R?Swaps=%W?ForcedSwitches=%c?WaitSwitches=%w?Inputs=%I?Outputs=%O?SocketReceived=%r?SocketSent=%s?Signals=%k?ReturnCode=%x /bin/sh -c "./gcelljunction_wrapper.sh" "-x" "./tricellular_junctions_new" "--" "38"
gc3-user 15529 0.0 0.0 4404 584 ? S Jul09 0:00 \_ /bin/sh -c "./gcelljunction_wrapper.sh" "-x" "./tricellular_junctions_new" "--" "38"
gc3-user 15530 0.0 0.0 11044 1412 ? S Jul09 0:00 \_ /bin/bash ./gcelljunction_wrapper.sh -x ./tricellular_junctions_new -- 38
gc3-user 15535 53.7 2.1 1434392 175608 ? Sl Jul09 30838:45 \_ ./tricellular_junctions_new 38
Now this has also a very bad unintended consequence: since GC3Pie
deletes the resource file, the VM slot is then marked as free to run
other jobs, which it isn't -- this is how we ended up having two
processes running in Tinri's VMs.
A good way to do the killing on Linux should be to kill all processes
that run in the same temporary directory::
ec2ssh -n "gc3-user@$ip" "find /proc/*/cwd -maxdepth 0 -printf '%p %l\n' 2>/dev/null | egrep '${cwd}\$'" 2>/dev/null | \
(while read proc wd; do
pid=$(echo "$proc" | cut -d/ -f3)
#echo "DEBUG: pid=$pid"
#ec2ssh -n "gc3-user@$ip" "ps --no-headers u '$pid'" 2>/dev/null
ec2ssh -n "gc3-user@$ip" "kill $pid" 2>/dev/null
done)
I am not sure if the MacOSX `/proc` provides the same information,
though.
Original issue reported on code.google.com by riccardo.murri@gmail.com on 18 Aug 2014 at 12:48
Original issue reported on code.google.com by
riccardo.murri@gmail.com
on 18 Aug 2014 at 12:48