Open ls233 opened 3 years ago
That link doesn't work. Does your LSF cluster have a wiki page?
well, what I'm basically asking you is what platform should I specify for this: caper init [PLATFORM]
I don't think my HPC has a wiki, but here https://labs.icahn.mssm.edu/minervalab/lsf-queues/ is some description.
On Tue, Sep 22, 2020 at 1:52 PM Jin Lee notifications@github.com wrote:
That link doesn't work. Does your LSF cluster have a wiki page?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ENCODE-DCC/caper/issues/93#issuecomment-696880501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPJDGTBBPDVDTMHZMCFEDSHDP7HANCNFSM4RVXZ47Q .
-- German Nudelman, Ph.D. Sr. Bioinformatics Developer/Analyst Icahn School of Medicine at Mount Sinai https://mssm.zoom.us/j/4969297959
Caper doesn't currently support LSF. If I can get some detailed info about bsub
and monitoring command then I can add it to Caper later.
You may need to run Caper with local
backend, which means that Caper will not bsub tasks. It will run all tasks on a current shell.
Login on a compute node and then run
caper run ATAC_WDL -i INPUT_JSON --singularity --max-concurrent-tasks 2
Use screen
or nohup
to keep the session on.
Or bsub
caper command line itself with very large resources.
If you want to save resources on a compute node, then serialize all tasks by using --max-concurrent-tasks 1
.
Thanks Jin for the suggestion.
For practical reasons, deploying a pipeline such as the ENCODE atac seq without the ability to submit jobs is somewhat of limited utility, unfortunately. It is especially relevant nowadays when the datasets may contain hundreds of samples. Whenever you have the resources, I'd be happy to work with you to add the LSF support to Casper, if needed .
Could you pls advise what would be a good starting point for this?
Best, German
On Tue, Sep 22, 2020 at 3:42 PM Jin Lee notifications@github.com wrote:
You may need to run Caper with local backend, which means that Caper will not bsub tasks. It will run all tasks on a current shell.
Login on a compute node and then run
caper run ATAC_WDL -i INPUT_JSON --singularity --max-concurrent-tasks 2
Use screen or nohup to keep the session on. Or bsub caper command line itself with very large resources.
If you want to save resources on a compute node, then serialize all tasks by using --max-concurrent-tasks 1.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ENCODE-DCC/caper/issues/93#issuecomment-696939322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPJDGYFSKQ6ZDG4MAIY5LSHD43PANCNFSM4RVXZ47Q .
-- German Nudelman, Ph.D. Sr. Bioinformatics Developer/Analyst Icahn School of Medicine at Mount Sinai https://mssm.zoom.us/j/4969297959
Sorry for the late reply, currently we don't have a plan to add a LSF backend. If you are familiar with python
then you can start by modifying the PBS backend of Caper.
https://github.com/ENCODE-DCC/caper/blob/master/caper/cromwell_backend.py#L710
You need to modify bash command lines under keys submit
, kill
, check-alive
and job-id-regex
. For example replace qsub
with bsub
.
'submit': dedent(
"""\
if [ -z \\"$SINGULARITY_BINDPATH\\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \\
if [ -z \\"$SINGULARITY_CACHEDIR\\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;
echo "${if !defined(singularity) then '/bin/bash ' + script
else
'singularity exec --cleanenv ' +
'--home ' + cwd + ' ' +
(if defined(gpu) then '--nv ' else '') +
singularity + ' /bin/bash ' + script}" | \\
qsub \\
-N ${job_name} \\
-o ${out} \\
-e ${err} \\
${true="-lnodes=1:ppn=" false="" defined(cpu)}${cpu}${true=":mem=" false="" defined(memory_mb)}${memory_mb}${true="mb" false="" defined(memory_mb)} \\
${'-lwalltime=' + time + ':0:0'} \\
${'-lngpus=' + gpu} \\
${'-q ' + pbs_queue} \\
${pbs_extra_param} \\
-V
"""
),
'exit-code-timeout-seconds': 180,
'kill': 'qdel ${job_id}',
'check-alive': 'qstat ${job_id}',
'job-id-regex': '(\\d+)',
I'll probably be working on getting this to LSF soon. In the mean time this might help, it's old but the basic commands typically don't change that much. https://modelingguru.nasa.gov/docs/DOC-1040
Should look something like this
class CromwellBackendlsf(CromwellBackendLocal):
TEMPLATE_BACKEND = {
'config': {
'default-runtime-attributes': {'time': 24},
'script-epilogue': 'sleep 5',
'runtime-attributes': dedent(
"""\
String? docker
String? docker_user
Int cpu = 1
Int? gpu
Int? time
Int? memory_mb
String? lsf_queue
String? lsf_extra_param
String? singularity
String? singularity_bindpath
String? singularity_cachedir
"""
),
'submit': dedent(
"""\
if [ -z \\"$SINGULARITY_BINDPATH\\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \\
if [ -z \\"$SINGULARITY_CACHEDIR\\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;
echo "${if !defined(singularity) then '/bin/bash ' + script
else
'singularity exec --cleanenv ' +
'--home ' + cwd + ' ' +
singularity + ' /bin/bash ' + script}" | \\
bsub \\
-J ${job_name} \\
-o ${out} \\
-e ${err} \\
${true="-n=" false="" defined(cpu)}${cpu} \\
${true="-R 'rusage[mem=" false="" defined(memory_mb)}${memory_mb} ${true="mb]'" false="" defined(memory_mb)} \\
${'-W=' + time + ':0'} \\
${'-q ' + lsf_queue} \\
${lsf_extra_param} \\
-V
"""
),
'exit-code-timeout-seconds': 180,
'kill': 'bkill ${job_id}',
'check-alive': 'bjobs ${job_id}',
'job-id-regex': '(\\d+)',
}
}
def __init__(
self,
local_out_dir,
max_concurrent_tasks=CromwellBackendBase.DEFAULT_CONCURRENT_JOB_LIMIT,
soft_glob_output=False,
local_hash_strat=CromwellBackendLocal.DEFAULT_LOCAL_HASH_STRAT,
lsf_queue=None,
lsf_extra_param=None,
):
super().__init__(
local_out_dir=local_out_dir,
backend_name=BACKEND_LSF,
max_concurrent_tasks=max_concurrent_tasks,
soft_glob_output=soft_glob_output,
local_hash_strat=local_hash_strat,
)
self.merge_backend(CromwellBackendLSF.TEMPLATE_BACKEND)
self.backend_config.pop('submit-docker')
if lsf_queue:
self.default_runtime_attributes['lsf_queue'] = lsf_queue
if LSF_extra_param:
self.default_runtime_attributes['LSF_extra_param'] = lsf_extra_param```
Note: have not tested this
I got rid of GPU because GPU use is dependent on LSF implmenetation.
However, @leepc12 how is "job-id-regex" grabbed in PBS? I'm not completely familiar with how job id's are grabbed from PBS so any insight on this would be much appreciated. It shouldn't be too difficult to construct a regex.
Never mind, went threw cromwell docs and found this
LSF {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /usr/bin/env bash ${script}"
kill = "bkill ${job_id}"
check-alive = "bjobs ${job_id}"
job-id-regex = "Job <(\\d+)>.*"
}
}
so everything is fairly similar
I am modifying the PBS backend at https://github.com/ENCODE-DCC/caper/blob/master/caper/cromwell_backend.py but when I run caper I still get an error saying I am using qsub. Any help would be appreciated.
For future reference this is the LSF backend file that was ran that worked. You cluster may not have a -G or -q flag so adjust as you may. I also had to set up specific paths (the PATHS="" and LSF_DOCKER_VOLUMES="") for my LSF call. If your compute cluster or system is different you'll want to take those out too. When you run caper just run --backend-file name_of_backendfile.conf
. Thanks to @leepc12 for helping me set this up.
backend {
providers {
pbs {
config {
submit = """if [ -z \"$SINGULARITY_BINDPATH\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \
if [ -z \"$SINGULARITY_CACHEDIR\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;
echo "${if !defined(singularity) then '/bin/bash ' + script
else
'singularity exec --cleanenv ' +
'--home ' + cwd + ' ' +
(if defined(gpu) then '--nv ' else '') +
singularity + ' /bin/bash ' + script}" | \
PATH="/opt/juicer/CPU/common:/opt/hic-pipeline/hic_pipeline:$PATH" LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active" \
bsub \
-J ${job_name} \
-o ${out} \
-e ${err} \
${true="-n " false="" defined(cpu)}${cpu} \
${true="-M" false="" defined(memory_mb)}${memory_mb}${true="MB" false="" defined(memory_mb)} \ \
${'-W' + time + ':0:0'} \
${'-q ' + pbs_queue} \
-G compute-group \
${pbs_extra_param} \
"""
kill = "bkill ${job_id}"
check-alive = "bjobs ${job_id}"
job-id-regex = "(\\d+)"
}
}
}
}```
Hi everyone. I'm charged with standing up the ENCODE ATAC-seq pipeline to work in our environment, which is LSF, and I'm willing to take the baton across the finish line with GitHub pull request to see LSF supported out of the box for all users of Caper.
@HenryCWong, you have done most of the legwork already. If I can test your changes locally, and everything works for the two of us, at our two different sites, is there a way I can walk you through how to do a PR on GitHub, or... are you clear on how to do that? Do you have the time?
It would be a shame for you not to get credit, if it gets merged into the codebase.
@ernstki: Please let me make a dev PR for you and you can pull it (you may need to git pull
the test branch and add the git directory to PYTHONPATH
so that pip-installed one is ignored) and test on your clusters.
All I need a working custom backend file (--backend-file
) that works most of LSF clusters. Then you will be able to use caper init lsf
and just define required parameters in the conf.(~/.caper/default.conf
).
If that works for you two @ernstki and @HenryCWong then I can merge it to master.
Hi sorry for the late response y'all. So do you still want me to take the PR since @leepc12 is making a dev PR?
I can get you the custom lsf backend file tomorrow. The one above should work but I also haven't been in here in 2 months so I'll double check things.
@HenryCWong If you're willing to just
…I'm willing to cherry-pick that commit from your fork and do any remaining work to get it in a state that meets @leepc12's requirements.
This way you'll get credit for the work you've done in the Git commit history for Caper, and you will be Internet Famous. ;)
If that kind of fame has no great appeal for you, I can just copy-paste what you have above instead, I will credit you in the relevant commit message, and you can forget about the forking and all that.
I think we can discuss whether @leepc12 wants to put custom backends in a contrib
subdirectory and other details like that in the PR.
Thanks for the info. I forked the and made a commit here https://github.com/HenryCWong/caper.
I opened my password manager to log in to specifically thank you for doing this.
(I am another bioinformatician at Mount Sinai, on the same computing environment, that needed this fix)
It seems IBM has been customizing specific LSF things for customers so if it doesn't work for you guys and you need to do run caper with a custom backend I can try to help out.
Thank you so much!
On Tue, Sep 21, 2021 at 11:38 AM Henry C. Wong @.***> wrote:
It seems IBM has been customizing specific LSF things for customers so if it doesn't work for you guys and you need to do run caper with a custom backend I can try to help out.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ENCODE-DCC/caper/issues/93#issuecomment-924109251, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVNCLOIVAAPNTNMHKYRNU3UDCRG5ANCNFSM4RVXZ47Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
It looks like #148 (release 2.0.0) implements LSF support, so thanks @leepc12!
Not sure if that's based on what @HenryCWong shared here or not, but it looks like this issue could be closed if v2.0.0 meets @ls233's requirements.
The project I needed this for is not nearing the stage where it's ready to submit jobs to a cluster anyway, so I wouldn't have been to work on this for several weeks at least.
Hi Jin,
I'm looking for the right value of a platform parameter to be specified to init Caper on my HPC (Mount Sinai). My HPC uses the LSF system. I'm referring to section 2.3 of this manual - https://github.com/MoTrPAC/motrpac-atac-seq-pipeline.
Thanks, -- German Nudelman, Ph.D. Sr. Bioinformatics Developer/Analyst Icahn School of Medicine at Mount Sinai