NBChub / bgcflow

Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
https://github.com/NBChub/bgcflow/wiki
MIT License
33 stars 9 forks source link

Running BGCFlow in DTU LSF HPC #361

Open matinnuhamunada opened 1 week ago

matinnuhamunada commented 1 week ago

For DTU students/staff that wanted to run BGCFlow on the LSF HPC facility, follow these steps:

# go to login node
ssh userid@login1.gbar.dtu.dk # access to login node
# once you have been assigned a scratch dir, create a symlink to your home dir
SCRATCH_DIR="/work3/<user_id>/" # change user id accordingly

# create a symlink to the scratch dir
ln -s $SCRATCH_DIR drive

2 directories, 0 files (base) ~/drive/bgcflow

- Install the lsf plugin

linuxsh # go to one of the worker node

conda run -n bgcflow mamba install bioconda::snakemake-executor-plugin-lsf -y


- execute the workflow

Create profile config file:
```yaml
jobs: 4
executor: lsf
default-resources:
    mem_mb: 200
#set-threads:
#    myrule: 5
set-resources:
    prokka:
        mem: 8000MB
    antismash:
        mem: 8000MB
    checkm:
        mem: 16000MB
    automlst_wrapper:
        mem: 8000MB
    bigscape:
        mem: 24000MB
    arts:
        mem: 8000MB

Then run the workflow

# IMPORTANT, run this from the worker node uslsf_project=<project name> lsf_queue=<hpc>ng linuxsh 
cd ~/drive/bgcflow/
conda run -n bgcflow snakemake --executor lsf --use-conda --profile <path to a directory containing the profile config.yaml> --default-resources  lsf_project=<project name> lsf_queue=<hpc>

to find the right queue, use: bqueues -u <user id>

17 directories, 19 files (base) ~

matinnuhamunada commented 1 week ago

The worker node does not allow downloading files using FTP access, to fix this edit the main Snakefile so that rule ncbi_genome_download is done on the parent node:

report: "report/workflow.rst"

localrules: all, ncbi_genome_download
...
matinnuhamunada commented 1 week ago

Seems like some jobs fail because it exceed memory usage limit. Need to fix this in the LSF profile.