FischbachLab / nf-bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
https://bacannot.readthedocs.io/en/latest/
GNU General Public License v3.0
0 stars 0 forks source link

nf-bacannot

This README describes how to launch the Bacannot pipeline on the MAF AWS Infrastructure.

For information about the original pipeline and all the tools that are used by the analysis pipeline please refer to the Bacannot README file.

Table of contents

Please Note

Quick Test

aws batch submit-job \
    --job-name nf-bacannot-mrsa \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=FischbachLab/nf-bacannot,\
"-profile","maf",\
"--input","s3://genomics-workflow-core/Results/Bacannot/MRSA/20221102/MRSA.yaml",\
"--output","s3://genomics-workflow-core/Results/Bacannot/00_TEST/MRSA/20230407"

Usage

aws batch submit-job \
    --job-name nf-bacannot-hCom2 \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=FischbachLab/nf-bacannot,\
"-profile","maf",\
"--input","s3://genomics-workflow-core/Results/Bacannot/hCom2/20221102/inputs/hCom2.yaml"
"--output","s3://genomics-workflow-core/Results/Bacannot/hCom2/20221102"

Helper Scripts

renameFastaHeaders.py

python renameFastaHeaders.py <ORIGINAL_FASTA_FILE> <RENAMED_FASTA_FILE>
Example
python renameFastaHeaders.py fasta_folder/genome.fasta renamed_fasta_folder/genome.fasta

createSubmissionYaml.py

Create submission YAML file for bacannot pipeline

Install dependency:

conda create -n bacannot python=3.11
pip install -U ruamel.yaml cloudpathlib[s3]

Run the script:

python createSubmissionYaml.py \
    -g <Local or S3 Path to Genome(s) directory> \
    -project <Name of the project that this data belongs to> \
    -prefix <Subset of the data in this Project; or date in YYYYMMDD format> \
    -s <Output YAML file name> \
    --extension fna (Optional: if you wish to use a different extension for the fasta files, default is fasta) \
    --copy-genomes (Optional: if you wish to copy the input genomes to the output directory, default is False) \
    --use-bakta (Optional: if you wish to use Bakta, instead of the standard Prokka, Most people SHOULD NOT use this flag, default is False)
Example
python createSubmissionYaml.py \
    -g s3://genomics-workflow-core/Results/BinQC/MITI-MCB/20230324/fasta/ \
    -project MITI-MCB \
    -prefix 20230411 \
    -s test.yaml

aggregateGFFs.py

Copies GFF files from each sample folder to the aggregate folder.

Example
python aggregateGFFs.py \
  -p s3://genomics-workflow-core/Results/Bacannot/MITI-MCB/20230515 \
  -s s3://genomics-workflow-core/Results/Bacannot/MITI-MCB/20230515/inputs/DELETE_ME.yaml 

Exploring the results

This pipeline generates A LOT of data per genome. Each genome contains a directory structure described here. The easiest way to explore this data interactively is by using docker.

Make sure you have docker installed. See instructions here.

Once docker is installed and running, sync the genome directory that is of interest to you, by using the aws s3 sync command. The following commands will explain the process using the annotation outputs of the Slackia-exigua-ATCC-700122-MAF-2 genome, present on S3 at s3://genomics-workflow-core/Results/Bacannot/00_TEST/20221031/.

Download results

aws s3 sync s3://genomics-workflow-core/Results/Bacannot/00_TEST/20221031/Slackia-exigua-ATCC-700122-MAF-2/ Slackia-exigua-ATCC-700122-MAF-2

This command will download all the data into a local folder called Slackia-exigua-ATCC-700122-MAF-2.

Launch Interactive Data Browser

cd Slackia-exigua-ATCC-700122-MAF-2
docker run -v $(pwd):/work -d --rm --platform linux/amd64 -p 3838:3838 -p 4567:4567 --name ServerBacannot fmalmeida/bacannot:server

If this is your first time running this viewer, you might see docker trying to download a lot of data. This is normal and can take some time depending on your internet speeds. Once complete, you're now ready to interact with your data. Simply open your favorite web browser and go to http://localhost:3838/. Note the use of http and not https. Some browsers may automatically make this change. In case you are unable to seen your webpage copy and paste this in your web browser (rather that clicking on this link).

If you're using an EC2 instance, go to the AWS EC2 console by logging into your AWS account and Identify your instance and note the public IP address for your instance. Open your favorite web browser and go to http://Public.IP.Address:3838/. Note the use of http and not https. Some browsers may automatically make this change. In case you are unable to seen your webpage copy and paste this in your web browser (rather that clicking on this link).

Et voila! You can now explore your data!

Shutdown the Data Browser

All great things must come to an end. Use the following command to shut down your docker daemon that will in turn kill the data explorer webpage.

docker rm -f ServerBacannot