Closed fmarletaz closed 6 years ago
Have you had any luck running on SLURM? I am also having trouble so I can't offer any real solutions, but I have a couple suggestions to try.
Depending on your configuration, changing srun
to sbatch
may help. Also, you could try hard-coding your resources and partition in the submit
call and see if that helps. Our SLURM scheduler does not have the --wait
option enabled, so removing that may also help.
Please let us know if you are able to find a reliable way to submit blocking calls to Slurm. We currently have to way to test that.
Note that this is a general Slurm problem, not pypeFLOW. (And not Falcon.) If you are completely unable to get a blocking call that works (and you should be testing in your shell, not via Falcon/pypeflow), you can try the old pwatcher_type = fs_based
. That expects non-blocking calls, but it is far more complex because it watches the filesystem to learn when jobs are finished.
Hi - I am trying to install and run FALCON on a slurm system. Installation goes well and the test seems conclusive. I read the configuration document, and attempted to use the blocking option for
pwatcher_type
. I then launch the assembly with a sbatch script with a simple commandfc_run fc_run_3.cfg
. However there are 2 problems: (1) in the configuration page, you advice to use a -W option (I assume for 'blocking' although I am not completely sure how it works), but this option in the latest slurm version expects a time:-W, --wait=<seconds>
. I put 2 seconds in my config, but I am not sure it is the right option anymore. (2) the falcon run starts with building the database, but the script launched for the is only using 1 proc and 4Gb of memory, which do not match my specifications. This memory is too limited and the job crashes at some point. Moreover, this script does not appear in the scheduler (when checking squeue) although it appears to run. I would really appreciate some help on all that. Thanks!The end of the all.log file: