-
Launching many nodes (e.g. 15 C4.8xlarge instances) results in nodes taking very long to register with Slurm, and ultimately results in downscaling. Inspecting "sinfo", about 1 node is registered with…
-
I've referenced this page: https://github.com/PySlurm/pyslurm/wiki/Testing-PySlurm-with-Docker
And I guess this article is only about the slurm in the localhost.
I want to construct the system where…
-
I have a slurm cluster which also runs a local prometheus server.I would like to test your exporter but cannot figure out what parameters to use to do so.
-
I'm using goofys to expose test data on s3 to an ec2 instance.
Out load test script runs continuous jobs with the s3 transfer rate being the limiting factor.
There are 5 concurrent transfers gong at …
bedge updated
7 years ago
-
Job id 435266 on the test server ran for 13 hours without finishing. When I looked in the sandbox, it looked as if it had finished, but Slurm still had it in the running state. I ran `bpsh 1 scontrol …
-
EDIT: Issue was previously titled "Change default destination in Galaxy job_conf.xml"
Would anyone object if I changed the default job destination to "slurm_cluster" in the job_conf.xml file? Curre…
-
Hi,
I'm trying to have the backup data exported from galaxy-appliance container into the galaxy_storage volume (folder on host), to be restored again properly **after having rebuild** the galaxy-appl…
-
The fleet crashed because of issue #529, but I can't think of any reason why that output should be nondeterministic. I wonder if the MD5 was calculated before the file finished writing to disk.
The
…
-
Hi!
Thanks for this awesome ansible playbook. It hugely streamline the installation process.
I have an issue when I want to run it on my remote machine, because of my Ubuntu 16.04 and nginx?
…
-
Branch: development
version: 5e83baf
environment: tmaster-controller01(VM)
Confusing error message in the log.
```
[2016-09-22T15:29:43.492] error: Node node001 appears to have a different slurm.co…