Open mabahj opened 9 years ago
Hmmm, tricky part will be to reproduce this issue. All I have for testing is a VirtualBox/Vagrant PBS Torque box. Are you aware of some way to reproduce this issue with an environment similar to yours?
Well. You could set up SGE, which is free. But I could not demand anything here. Another option could be to add some more logging output. I've enabled full logging in Jenkins and the only PBS entry I see is this:
apr 17, 2015 8:45:42 AM FINE hudson.remoting.Channel
Received UserRequest:jenkins.plugins.pbs.tasks.Qsub@293c71
If you add some output to the log, then it should be easier to see what happens?
Job config:
<?xml version="1.0" encoding="UTF-8"?>
<project>
<actions/>
<description>https://groups.google.com/forum/#!topic/biouno-users/fWBUIOiWjUg
http://biouno.org/jenkins-plugins.html
https://github.com/biouno/pbs-plugin/releases</description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.8.4">
<maxConcurrentPerNode>0</maxConcurrentPerNode>
<maxConcurrentTotal>0</maxConcurrentTotal>
<categories>
<string>slow_jobs</string>
</categories>
<throttleEnabled>false</throttleEnabled>
<throttleOption>category</throttleOption>
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
<scm class="hudson.scm.NullSCM"/>
<assignedNode>SGE</assignedNode>
<canRoam>false</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers/>
<concurrentBuild>false</concurrentBuild>
<builders>
<jenkins.plugins.pbs.PBSBuilder plugin="pbs@0.2">
<script>#!/bin/bash
echo "=========================================="
echo "Sleeping on grid computer $(hostname)"
sleep 60
echo "Done"
echo "=========================================="</script>
</jenkins.plugins.pbs.PBSBuilder>
</builders>
<publishers/>
<buildWrappers/>
</project>
Node config:
<?xml version="1.0" encoding="UTF-8"?>
<jenkins.plugins.pbs.slaves.PBSSlave plugin="pbs@0.2">
<name>SGE</name>
<description>Son of Grid</description>
<remoteFS>/work/jenkins/jenkins_test_grid_slave</remoteFS>
<numExecutors>2</numExecutors>
<mode>EXCLUSIVE</mode>
<retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
<launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.9">
<host>myhost</host>
<port>22</port>
<credentialsId>xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</credentialsId>
<maxNumRetries>0</maxNumRetries>
<retryWaitTime>0</retryWaitTime>
</launcher>
<label/>
<nodeProperties>
<hudson.slaves.EnvironmentVariablesNodeProperty>
<envVars serialization="custom">
<unserializable-parents/>
<tree-map>
<default>
<comparator class="hudson.util.CaseInsensitiveComparator"/>
</default>
<int>8</int>
<string>GridEngRoot</string>
<string>/cad/gnu/sge_test</string>
<string>PATH</string>
<string>/usr/bin:/usr/sbin:/bin:/usr/bin/X11:/usr/local/etc/jre/current/bin:/pri/jenkins/bin:/cad/gnu/sge_test/bin:/cad/gnu/sge_test/bin/lx-amd64</string>
<string>SGE_ARCH</string>
<string>lx-amd64</string>
<string>SGE_CELL</string>
<string>default</string>
<string>SGE_CLUSTER_NAME</string>
<string>sim1</string>
<string>SGE_EXECD_PORT</string>
<string>6445</string>
<string>SGE_QMASTER_PORT</string>
<string>6444</string>
<string>SGE_ROOT</string>
<string>/cad/gnu/sge_test</string>
</tree-map>
</envVars>
</hudson.slaves.EnvironmentVariablesNodeProperty>
</nodeProperties>
<userId>jenkins</userId>
</jenkins.plugins.pbs.slaves.PBSSlave>
Note to self: test this docker image when debugging this issue https://registry.hub.docker.com/u/agaveapi/torque/
The docker image worked. Tried with a job configuration that comes with the container. Will try your job configuration. Probably while working on #9 I'll comment here what's wrong or how you could get your set up working.
I get a fatal (console below) when I try to submit jobs. This error does not contain any error output. Jenkins 1.599. PBS Plug-in 0.2. Master is running on Windows 7, slaves on linux. SGE grid. I am able to post a job to SGE manually if I copy and paste the command line shown in the console output. qsub accepts the command. But the job fails because the script created (/temp/jenkins/pbs/jenkinsPBS_2918185274526175465/script) does not have write permission.
Error message: