google-code-export / yabi

Automatically exported from code.google.com/p/yabi
0 stars 1 forks source link

Capture qsub errors (pbs) #96

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi

When a PBS Qsub job fails to launch, the qsub error message isn't reported in 
the admin interface. If this error could be captured, debugging would be much 
easier.

E.g. When attempting to request more than 1 node in a queue that does not allow 
it, the following error is generated by qsub (to stderr);
"qsub: Job exceeds queue and/or server resource limit"

But in the syslogs page, only an empty message is returned 
(https://bioflow.hpcu.uq.edu.au/yabiadmin/admin/yabiengine/syslog/414/) e.g.  
"Remote execution backend sent status message: id="

Normally, this would be the job id number (e.g. id=758646.pbsserver), captured 
from stdout.

Also, these jobs with empty status messages will keep 'running' in the 
front-end interface, presumably waiting for a response. If the error could be 
caught, it would be nice if they could could fail outright.

(NB: Alternatively, access to the exact qsub command and script (via the admin 
interface), as per issue 63 would at least allow local testing.)

Thanks,
Sarah.

Original issue reported on code.google.com by s.willi...@qfab.org on 14 Nov 2011 at 12:55

GoogleCodeExporter commented 9 years ago

Original comment by aahun...@gmail.com on 27 Feb 2012 at 2:15

GoogleCodeExporter commented 9 years ago
Fixed this on ssh+pbspro backend in revision 109a0ebae7c6. Tested by setting 
submission script to insane node number. Task failed with error in admin that 
includes:

Remote response was:
SSHQsub error: SSH exited 188 with message qsub: Job exceeds queue and/or 
server resource limits

Original comment by retrogra...@gmail.com on 1 Mar 2012 at 7:16