abhilekhsingh / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

AppPot execution stalled under the SubProcess backend? #224

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Use the localhost/subprocess backend to execute an AppPot application;
the application executes correctly, but the "linux" process is never
reaped by its "empty" parent.  Here is a typical snapshot of the
process table in such occasions:

    rmurri    9334 22144  0 02:08 pts/5    00:00:00           python ./ggamess.py --apppot /home/rmurri/gc3/apppot/trunk/apppot0.disk.img.gamess tests/exam04.inp -s XXX -C10 -vvvvvv -r localhost
    rmurri    9339  9334  2 02:08 pts/5    00:00:04             /bin/sh /home/rmurri/gc3/apppot/trunk/apppot-start.sh --apppot apppot.img /home/user/gamess/localgms exam04.inp
    rmurri    9361  9339  0 02:08 pts/5    00:00:00               cat .apppot.stdout
    [...]
    rmurri    9359     1  0 02:08 ?        00:00:00   empty -f -i .apppot.stdin -o .apppot.stdout linux umid=apppot.xenia.9339 mem=512M hostfs=/ ubd0=apppot.img eth0=slirp,,slirp-fullbolt eth1=mcast,,239.255.82.77,8277,1 con=fd:0,fd:1 root=/dev/ubda1 TERM=screen apppot.uid=1000 apppot.gid=1000 apppot.jobdir=/tmp/gc3libs.OjVshQ.tmp.d --  '/home/user/gamess/localgms' 'exam04.inp'
    rmurri    9363  9359  5 02:08 ?        00:00:06     [linux] <defunct>

Attaching `strace` to the `empty` process 9359 shows it blocked into a
`select` call:

    $ sudo strace -p 9359
    select(9, [6 8], NULL, NULL, NULL

However, there is nothing suspicious in file descriptors 6 and 8
(except that fd 6 is attached to a FIFO whose other end has no longer
a writer attached ...):

    $ ls -l /proc/9359/fd
    totale 0
    lr-x------ 1 rmurri rmurri 64 2011-10-12 02:18 0 -> /dev/null
    l-wx------ 1 rmurri rmurri 64 2011-10-12 02:18 1 -> /tmp/gc3libs.OjVshQ.tmp.d/exam04.out
    lr-x------ 1 rmurri rmurri 64 2011-10-12 02:18 11 -> /dev/urandom
    l-wx------ 1 rmurri rmurri 64 2011-10-12 02:18 2 -> /tmp/gc3libs.OjVshQ.tmp.d/exam04.out
    l-wx------ 1 rmurri rmurri 64 2011-10-12 02:18 3 -> /home/rmurri/.gc3/debug.log
    l-wx------ 1 rmurri rmurri 64 2011-10-12 02:18 4 -> /dev/null
    lrwx------ 1 rmurri rmurri 64 2011-10-12 02:18 5 -> socket:[14760891]
    lrwx------ 1 rmurri rmurri 64 2011-10-12 02:18 6 -> /tmp/gc3libs.OjVshQ.tmp.d/.apppot.stdin
    lrwx------ 1 rmurri rmurri 64 2011-10-12 02:18 7 -> /tmp/gc3libs.OjVshQ.tmp.d/.apppot.stdout
    lrwx------ 1 rmurri rmurri 64 2011-10-12 02:18 8 -> /dev/ptmx
    lrwx------ 1 rmurri rmurri 64 2011-10-12 02:18 9 -> /dev/pts/6

Killing the `empty` process with `kill -9` (cannot be killed by a
SIGTERM), lets GC3Pie continue and reap the (correct) output from the
AppPot application.

We need to understand whether this is an issue with `empty`, with the
script `apppot-start.sh` or with the way `gc3libs.backends.subprocess`
invokes it.

Original issue reported on code.google.com by riccardo.murri@gmail.com on 12 Oct 2011 at 12:41

GoogleCodeExporter commented 9 years ago
Execution terminates correctly if using the "(sleep 365d) > .apppot.stdin &" 
trick, instead of using `empty`.  So it seems that the problem is in `empty`, 
or at least our usage of it.

Original comment by riccardo.murri@gmail.com on 12 Oct 2011 at 12:49