Example of input for --cluster-params

zihhuafang commented 4 years ago

Hi, I am not sure if I understood properly from previous issues for the input of --cluster-params. I am working with LSF scheduler. We use 'n' for number of cores, 'mem' for memory, 'W' for run time. I would submit a job like this: bsub -n 13 -W 04:00 -o job.log -R "rusage[mem=2000]" < script.sh

I put --cluster-params "bsub -n {cluster.cpu} -W {cluster.rt} -R \"rusage[mem={cluster.mem}]\"" However, I got the following error msg:

Preparing reference Traceback (most recent call last): File "/cluster/work/pausch/fang/smrtsv2/smrtsv.py", line 405, in cmd_return_code = cmd_args.func(cmd_args) File "/cluster/work/pausch/fang/smrtsv2/smrtsv.py", line 174, in run return_code = index(args) File "/cluster/work/pausch/fang/smrtsv2/smrtsv.py", line 31, in index cmd_log='reference/log' File "/cluster/work/pausch/fang/smrtsv2/smrtsvlib/smrtsvrunner.py", line 158, in run_snake_target cluster_params = cluster_params.format(**{'log': log}) KeyError: 'cluster'

Could you offer an example of what to put in as parameters in --cluster-params?

paudano commented 4 years ago

If you supply a cluster config JSON file with --cluster-config, cluster will be defined for each rule to fill in those wildcards. This is how resource requests are tuned per rule (some rules use more resources than others).

The JSON file we use is cluster.eichler.json (in the SMRT-SV repository), and it was developed for an SGE cluster. Copy cluster.eichler.json and modify it for your environment. Some syntax or resource requirements may need to be changed, but the cluster.eichler.json should be a useful template.

The cluster JSON defines 4 keywords per rule, "mem", "cpu", "rt", and "params". For each rule that does not define one or all of these, the "default" values are filled in. In our cluster, "mem" is multiplied by "cpu" to get the total memory (e.g. if cpu = 4G and mem = 6G, then the total memory requested is 24G). "rt" is the runtime allowed for the job, and "params" are additional parameters, which we use to set "disk_free" when a job uses local temp storage on a node.

If you get a cluster JSON working with LSF, please feel free to share it. Others will probably find it useful.

Let me know if this helps.

zihhuafang commented 4 years ago

Hi I finally got cluster JSON working. I attached the text file here as I cannot upload json file here.

My command to run smrtsv is

${SMRTSV_DIR}/smrtsv --wait-time 60 --tempdir ${TMP} --cluster-config ${SMRTSV_DIR}/cluster.json --cluster-params "bsub -J {{cluster.jobname}} -n {{cluster.cpu}} -W {{cluster.rt}} -o {log} -R \"rusage[mem={{cluster.mem}}]\"" run --batches 20 --runjobs "25,20,200,10" --threads 10 ${REF_FA} ${FOFN_FILE}

However, I still have issue using DRMAA if I put --distrubute and --DRMAA_LIBRARY_PATH.

cluster_json.txt

EichlerLab / smrtsv2

Example of input for --cluster-params #48