USCbiostats / slurmR

slurmR: A Lightweight Wrapper for Slurm
https://uscbiostats.github.io/slurmR/
Other
58 stars 12 forks source link

Slurm_collect throws 'Error in x$njobs : $ operator is invalid for atomic vectors' #40

Open tXiao95 opened 1 year ago

tXiao95 commented 1 year ago

I am testing the slurmR package on my school HPC. Everything works great using the Slurm_lapply with plan="none" then sbatch call to launch the job array. However - I get the following strange error when using Slurm_collect.

Warning: The call to -sacct- failed. This is probably due to not having slurm accounting up and running. For more information, checkout this discussion: https://github.com/USCbiostats/slurmR/issues29
Error in x$njobs : $ operator is invalid for atomic vectors`

The code I run is

library(slurmR)
ans <- Slurm_lapply(1:10, sqrt, plan="none")
sbatch(ans)
result <- Slurm_collect(ans)

I understand my cluster does not have slurm accounting enabled - but it seems the error is unrelated to the warning? However when I enter debug mode, the x object has a njobs attribute and does not throw an error when I retrieve it directly.

samkimhis commented 11 months ago

I have the same error. When this error occurs, the jobs were still executed but the job object was not assigned, so I had to look into the tmp_path folder and manually read the job.rds file. Looking at the traceback (copied below), it appears to happen when the input for status() is just a job ID (a character string, an atomic vector), which is allowed by the current function definition. Can you update the function so that the atomic vector input can be handled properly?

> traceback()
11: sprintf("%i_%i", job_id, 1:x$njobs)
10: data.frame(JobID = sprintf("%i_%i", job_id, 1:x$njobs), State = NA_character_, 
        ExitCode = "0:0", stringsAsFactors = FALSE)
9: sacct_(x, brief = TRUE, parsable = TRUE, allocations = TRUE)
8: status.default(x)
7: status(x)
6: wait_slurm.integer(get_job_id(x), ...)
5: wait_slurm.slurm_job(x)
4: wait_slurm(x)
3: sbatch.slurm_job(ans, wait = plan$wait, submit = plan$submit)
2: sbatch(ans, wait = plan$wait, submit = plan$submit)
1: Slurm_lapply(rep(1e+06, 100), simpi, njobs = 3, mc.cores = 16, plan = "wait")
gvegayon commented 7 months ago

I think this is a bug I have been trying to fix: https://github.com/USCbiostats/slurmR/issues/29. I have not that much time these days, so any PRs are welcomed :)