biouno / pbs-plugin

Jenkins PBS plug-in
http://biouno.org
9 stars 2 forks source link

Job hangs seeking job output #10

Open kinow opened 9 years ago

kinow commented 9 years ago

Currently a job submitting a PBS torque script can hang intermittently. Console output:

Started by user anonymous
Building remotely on pbs-docker (pbs torque) in workspace /var/jenkins/workspace/test-pbs-1
Submitting PBS job...
Created working directory '/tmp/jenkinsPBS_8552947413070244012' with permissions 'rwxrwxrwx'
PBS script: /tmp/jenkinsPBS_8552947413070244012/script
PBS Job submitted: 2.localhost
Seeking job end...

PBS log:

Job Output

Job: 2.localhost

07/23/2015 08:17:22  A    queue=debug
07/23/2015 08:17:22  A    user=testuser group=testuser jobname=script queue=debug ctime=1437635842 qtime=1437635842 etime=1437635842 start=1437635842 owner=testuser@localhost exec_host=localhost/0 Resource_List.nice=19 Resource_List.walltime=00:01:00 
07/23/2015 08:17:28  A    user=testuser group=testuser jobname=script queue=debug ctime=1437635842 qtime=1437635842 etime=1437635842 start=1437635842 owner=testuser@localhost exec_host=localhost/0 Resource_List.nice=19 Resource_List.walltime=00:01:00 session=667 end=1437635848 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:05

Job Error

/var/spool/torque/server_logs/20150723: No such file or directory
/var/spool/torque/mom_logs/20150723: No such file or directory
/var/spool/torque/sched_logs/20150723: No such file or directory
/var/spool/torque/server_priv/accounting/20150722: No such file or directory
/var/spool/torque/server_logs/20150722: No such file or directory
/var/spool/torque/mom_logs/20150722: No such file or directory
/var/spool/torque/sched_logs/20150722: No such file or directory
kinow commented 9 years ago

The plug-in had a behaviour that was changed later, but changing it back simply because it will fix the error is not the best choice in this case.

I'll investigate how Galaxy and some other tool is submitting jobs and checking the job status, and then will document here how the issue will be fixed.