Closed GoogleCodeExporter closed 9 years ago
What is the alternative? We use it several places in the current code.
mike
Original comment by mrgh...@gmail.com
on 24 Mar 2011 at 4:59
| What is the alternative? We use it several places in the current code.
|
This ticket refers to the new "subprocess" backend (r1332), not to the
Python module of the same name.
I guess this is an indication that the backend is badly named, since
it does not even make use of Python's subprocess :-)
Original comment by riccardo.murri@gmail.com
on 24 Mar 2011 at 5:02
Original comment by riccardo.murri@gmail.com
on 24 Mar 2011 at 7:34
Proposal for fixing this "SubProcess" backend bug:
1. Do not execute the target executable directly, but instead use GNU
time to wrap its execution. GNU time can be told (via the
`--format` option) to output also the exit code of the program it
runs, so we just save it to a file and can reap it any time later.
That is, if the backend gets passed an `Application` object with
`executable='pippo'` and `arguments=['a','b','c']`, then it should
really invoke `os.execl(['/usr/bin/time', ..., 'pippo', 'a', 'b', 'c']`
and *not* `os.execlp(['pippo', 'a', 'b', 'c'])` as it does presently.
Note:
* By default `/usr/bin/time` prints its statistics to STDERR, which
would mess up the application's own output. So time's `--output`
option should be used to redirect time's own output to a file.
I suggest that the file is named `.gc3pie-stat` (or something
similar) and located in the application execution directory.
* ARC also uses GNU time to wrap the execution of each program; the
format string they use in ARC 1.1 is:
'WallTime=%es\nKernelTime=%Ss\nUserTime=%Us\nCPUUsage=%P\nMaxResidentMemory=%MkB\nAverageResidentMemory=%tkB\nAverage
TotalMemory=%KkB\nAverageUnsharedMemory=%DkB\nAverageUnsharedStack=%pkB\nAverage
SharedMemory=%XkB\nPageSize=%ZB\nMajorPageFaults=%F\nMinorPageFaults=%R\nSwaps
=%W\nForcedSwitches=%c\nWaitSwitches=%w\nInputs=%I\nOutputs=%O\nSocketReceived=%
r\nSocketSent=%s\nSignals=%k\n'
I suggest we use the same format (so we can write a unified
parser) but *add* the "exit status" line. We *need* the exit
status saved to a file, because there's no parent process to reap
the exit status via the `wait()` call.
* GNU time is usually located at `/usr/bin/time`, but it would be
nice if the `SubProcess` constructor had this as a configurable
parameter, to be set from the `gc3pie.conf` configuration file.
(GNU time has a few buggy versions; it would be nice to be able
to point to a sane version on systems where the bugged one is the
default, e.g., recent RHEL or SLES.)
2. Instead of using `os.wait()` to grab the child process status, read
it from `/proc/PID/stat`. When `/proc/PID/stat` exists no
more, the process has terminated and one should go check the output
of the `time` command above and reap the exit code and signals from it.
There's a subtlety here: it is theoretically possible that we wait
so much between updates that the PID of the child process has been
re-used and we check `/proc/PID/stat` of a different process...
It *seems* that procfs uses ever-increasing inode numbers, so we
should probably store the inode number of the original
`/proc/PID/stat` file and check it -- if the inode number has
changed, then we're no longer looking at the same
`/proc/PID/stat`.
Original comment by riccardo.murri@gmail.com
on 6 Feb 2012 at 11:45
Original comment by riccardo.murri@gmail.com
on 20 Jun 2012 at 6:35
Implemented in revisions 2384-2389.
We use psutil http://code.google.com/p/psutil/ to check the status of the job
(instead of reading /proc/PID/stat file), and we use time as a wrapper for the
application.
A configuration option `time_cmd` for `resource` section has been added to
specify the path to the `time` command (by default it's just `time`, which will
point to the first binary found in PATH environment variable)
Each application's command is run through `time`, which will write its pid and
its output in a file inside the execution dir of the application. These files
are used by the backend to check the status of the time command and to check
the return code of the application.
Original comment by arcimbo...@gmail.com
on 21 Jun 2012 at 1:54
Original issue reported on code.google.com by
riccardo.murri@gmail.com
on 24 Mar 2011 at 4:53