-
**Describe the bug**
When I am using the most recent Megatrone-LM fork I get the following error
```
make: Entering directory '/workspace/megatron-lm/megatron/core/datasets'
g++ -O3 -Wall -sha…
-
**Snakemake version** 8.16.0
**Describe the bug**
The default behavior of `cores` is different and limits the number of threads for submitted jobs to the number of cores on the host machine.…
-
I have been dealing with a particularly strange submitit error that I am having trouble understanding. Specifically, all jobs I launch through submitit die after 7-10 hours without error. However, thi…
-
dask without progress bar is working well on ips and local BCBL.
Needs to test:
how to change config.yaml so that -N will give cluster job a proper name.
how to change config yaml to let SGE …
-
I am trying to run Harmbench for humanJailbreak on Llama2. I am using p3.8xlarge, which has four Tesla V100-SXM2-16GB. Every slurm job is trying to download the model again and again. resulting in Cud…
-
From user:
We found that when including a slurm array processing line in a slurm batch script, that this prevents any stats information from being recorded.
#SBATCH --array=0-1
We create…
-
You need to change line 108 of src/miniwdl_slurm/__init__.py as you are using the wrong slurm command line flag
Current value:
srun_args.extend(["--cpus-per-task", str(cpu)])
Change to:
srun_a…
-
I am running a regression test on Gaea and I noticed that some test jobs fail due to wall clock timeout error but the status of those jobs is incorrectly interpreted by rt scripts. This is part of the…
-
Context: I've got a SLURM cluster. I want to run `code tunnel` inside of interactive jobs such that users can then use the compute node for remote development. (Note, I don't want to use SSH as variab…
-
I'm trying to update my snakemake workflow to v8+
My nuisance so far is the following. I submit my workflows as a job in the login node from where snakemake with the executor do their thing. Proble…