Closed kcgthb closed 6 years ago
Hi @kcgthb,
I notice that in the docs https://osc.github.io/ood-documentation/master/installation/add-cluster-config.html#slurm it mentions that you can remove the cluster name from the configuration file if you are not running in a multi-cluster configuration, however, I believe that the activejobs uses this value to determine the host name to display. It's yet to be determined whether this is a bug in our documentation or the activejobs app, but for now, would you mind adding the cluster
value to that YAML file and letting us know if that fixes the app?
v2:
metadata:
title: "Cluster"
job:
adapter: "slurm"
cluster: "cluster"
bin: "/usr/bin"
conf: "/etc/slurm/slurm.conf"
Thanks!
@kcgthb
Hah, you found the RPM. Since you found the RPM that means you are probably reading the Documentation on the develop
branch (yet to be released):
https://osc.github.io/ood-documentation/develop/installation/resource-manager/slurm.html
Your example cluster config looks good, but I do want to confirm that your bin:
field is pointing to the path where you have the Slurm client binaries installed (e.g., is it /usr/bin/sbatch
?).
Also if you are working in a multicluster environment then you may need to specify the cluster:
field, otherwise ActiveJobs should work without it.
An example of a cluster config from one of our partners that is successfully using Slurm:
v2:
metadata:
title: "Cluster"
login:
host: "cluster.university.edu"
job:
adapter: "slurm"
bin: "/opt/packages/slurm/default/bin"
As a side question, do you see the cluster name "Cluster" in the top-right dropdown menu "All Clusters"
@brianmcmichael @nickjer Thanks! I appreciate the feedback and suggestion.
I did indeed follow the instructions from the develop
branch, because the RPM installation was so appealing to me. :)
I tried with and without the "cluster:" line in the /etc/ood/config/clusters.d/mycluster.yaml
configuration file, restarted httpd
and touched /var/www/ood/apps/sys/dashboard/tmp/restart.txt
between each try, but I can't still see any job listed.
My Slurm utilities are indeed in /usr/bin
(we install Slurm as an RPM):
$ which squeue
/usr/bin/squeue
And I don't see my cluster name in the top-right dropdown menu "All Clusters". That's what it looks like:
Aaargh, and then I just realized that I named my cluster config file mycluster.yaml
, instead of mycluster.yml
...
Using the proper extension made things work, all of a sudden. :smile:
Sorry for the noise, then, looks like things are working great now.
Actually thanks for pointing that out. It escaped me to check both spellings of the extension. I will open up an issue in the relevant repo about that.
Hi there!
I'm pretty sure it's a configuration issue on my end, but I'm just discovering OoD, I'm a bit overwhelmed by all the moving parts and I don't really know where to look.
I followed the installation instructions and installed the RPM (
ondemand-1.3.5-2.el7.x86_64
), I have a cluster file configured:and the dashboard, file explorer and shell apps are working beautifully. But the "Active Job" app doesn't and the job list stays desperately empty.
I can successfully run Slurm commands as a user on the OnDemand node (we're running Slurm 17.11.5), but it looks like the
activejobs
app cannot.Do you have any suggestions on things I could check to understand where the problem is coming from?
Thanks!