gc3pie / gc3pie

Python libraries and tools for running applications on diverse Grids and clusters
http://gc3pie.readthedocs.org/
GNU Lesser General Public License v2.1
44 stars 24 forks source link

Status 'unknown' of all ggeotop jobs after short runtime #418

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Launching a simulation with ggeotop. 
2.
3.

What is the expected output? What do you see instead?
Jobs get scheduled and are run, but come back 'unknown' instead of 'OK'. Also, 
there is no output visilbe in the directories.

What version of the product are you using? On what operating system?
ginfo 2.1.1 version (SVN $Revision: 3705 $) / Rocks 5.4.3 (Viper) / SGE

Please provide any additional information below.
gc3.gc3libs: DEBUG: About to update state of application: GeotopApplication.296 
(currently: UNKNOWN)
gc3.gc3libs: DEBUG: Checking auth: SshAuth
gc3.gc3libs: DEBUG: Opening LocalTransport...
gc3.gc3libs: DEBUG: Checking remote job status with 'qstat | egrep  '^ *80319'' 
...
gc3.gc3libs: DEBUG: Executed local command 'qstat | egrep  '^ *80319'', got 
exit status: 1
gc3.gc3libs: ERROR: Failed while running the `qstat`/`bjobs` command. exit 
code: 1, stderr: ''
gc3.gc3libs: DEBUG: The `qstat`/`bjobs` command returned no job information; 
trying with 'qacct -j 80319' instead ...
gc3.gc3libs: DEBUG: Executed local command 'qacct -j 80319', got exit status: 1
gc3.gc3libs: ERROR: Failed while running the `acct` command. exit code: 1, 
stderr: '/opt/gridengine/default/common/accounting: No such file or directory
'
Status of jobs in the 'GT01' session: (at 17:12:21, 10/17/13)
gc3.gc3libs: DEBUG: Engine.stats: Restricting to object of class 'GeotopTask'
         NEW  0/4   (0.0%)
     RUNNING  0/4   (0.0%)
     STOPPED  0/4   (0.0%)
   SUBMITTED  0/4   (0.0%)
  TERMINATED  0/4   (0.0%)
 TERMINATING  0/4   (0.0%)
     UNKNOWN  4/4  (100.0%)
       total  4/4  (100.0%)
ggeotop: Exiting upon user request (Ctrl+C)

Original issue reported on code.google.com by stephan....@carleton.ca on 17 Oct 2013 at 9:18

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 17 Oct 2013 at 9:35

GoogleCodeExporter commented 9 years ago
Hi Stephan,

this seems to be the cause of the problem:

| gc3.gc3libs: DEBUG: Executed local command 'qacct -j 80319', got
exit status: 1
| gc3.gc3libs: ERROR: Failed while running the `acct` command. exit
code: 1, stderr: '/opt/gridengine/default/common/accounting: No such
file or directory'

GC3Pie is trying to use the `qacct` command to get information about a
recently-finished job, but the command returns an error.

You need to ask your cluster sysadmin what's the local incantation
equivalent to `qacct -j 80319` in standard SGE, then we can work out
how GC3Pie can be configured to use that instead of `qacct`.

Original comment by riccardo.murri@gmail.com on 17 Oct 2013 at 9:35

GoogleCodeExporter commented 9 years ago
Is this issue still current/relevant?

Original comment by riccardo.murri@gmail.com on 19 Aug 2014 at 8:52