Closed stemangiola closed 3 years ago
Stefano,
I would like to know more about that exit code 1 from the R script. Do you know what would cause such an error from armet_{ACC,BLCA}_lv_4_gender.rds?
One possibility shot-in-the-dark scenario is that dev/armet_{ACC,BLCA}_lv_4_gender_regression.rds are using some intermediate file that has not been declared. If the file is not there, then it the R scripts may try to generate it. Thus, running a single a rule at a time for each script and many thereafter works as all the needed files are present. Trying to run many rules first at the same time makes them all to try to generate a file, and they step on each other. These intermediate files may come from automatically generated temporary files, which are harder to track.
Note that in the example with two rules that works you are using two different R scripts. What happens when the rules use the same script?
Thanks for your reply,
I found out that the simple R script
library(rstan)
would cause failure if called ~30 times, and all jobs start simultaneously. Would you be able to replicate the error?
I am inquiring with them, but my question to you is: how can I see the error message, what option/pipe should I use ?
Stefano,
Something like this may help:
dev/armet_ACC_lv_4_gender_regression.rds dev/armet_ACC_lv_4_gender_regression.stderr: dev/armet_ACC_lv_4_gender.rds
Rscript dev/TCGA_makeflow_pipeline/infer_censored.R dev/armet_ACC_lv_4_gender.rds dev/armet_ACC_lv_4_gender_regression.rds > dev/armet_ACC_lv_4_gender_regression.stderr 2>&1
If the rds script prints anything to the console, either normal output (stdout) or error output (stderr), then these outputs should appear in dev/armet_ACC_lv_4_gender_regression.stderr
Stefano,
We can work on eliminating possible issues where tasks are stepping on each other. I see that rstan needs to be told the number of cores. Is that something you are doing inside your rds scripts?
I see that you need something like:
options(mc.cores = 4)
Or if you want to get the value set from CORES in makeflow:
options(mc.cores = as.numeric(Sys.getenv("CORES", unset=4))
Simply calling library(rstan)
on my local machine >30 times does not produce an error. Do you see the error only in the cluster with slurm, or also somewhere else?
Hello, thanks to you error piping I found out that callr library was failing. No idea why, and could not fix, so I am reinstalling the whole R library and I will let you know asap.
Thanks!
Stefano, glad to hear you are making progress. Please keep us posted if you find a solution, or other issues.
Hello, I kind of understood that R future.batchtools that I was trying at the same time, destroys the temp directory, and all sort of problems start, instantly killing makeflow.
Thanks for your assistance!
Hello,
I have a strange recent behaviour. If I use a large makeflow file (that before was working) and many jobs are launched I have this error
(how can I know what is going wrong)
If I just pick a couple of commands from this long file
Then I don't have that error
How can I know what is going on?