The Slurm_lapply function fails, reporting job ids of NA, when run on a federated SLURM cluster. In this case the parallel slurm jobs were successfully started, but the parent process failed to parse the output from the sbatch command.
On a federated SLURM cluster, when the cluster name is specified in the sbatch_opts (and passed to the sbatch command), the output from sbatch looks like:
Submitted batch job 8653762 on cluster name_of_cluster
The regular expression used to parse this output and capture the job id on lines 142 and 224 of sbatch.R is:
".+ (?=[[:digit:]]+$)"
The "$" in that expression prevents the pattern from matching the sbatch output since there are characters following the job id. I suspect just removing the $ will solve the problem. I tried recoding that line as:
jobid <- as.integer(regmatches(ans,regexpr("[[:digit:]]+",ans)))
and that worked as well.
Thank you, @bmilash! Would you be willing to submit a PR? Another question: I have a Docker image with slurm for testing, do you think we could create a test for this?
The Slurm_lapply function fails, reporting job ids of NA, when run on a federated SLURM cluster. In this case the parallel slurm jobs were successfully started, but the parent process failed to parse the output from the sbatch command. On a federated SLURM cluster, when the cluster name is specified in the sbatch_opts (and passed to the sbatch command), the output from sbatch looks like: Submitted batch job 8653762 on cluster name_of_cluster The regular expression used to parse this output and capture the job id on lines 142 and 224 of sbatch.R is: ".+ (?=[[:digit:]]+$)" The "$" in that expression prevents the pattern from matching the sbatch output since there are characters following the job id. I suspect just removing the $ will solve the problem. I tried recoding that line as: jobid <- as.integer(regmatches(ans,regexpr("[[:digit:]]+",ans))) and that worked as well.