Krabbenhoft Lab genome annotation pipeline using BRAKER and GeMoMa

[!NOTE] WORK IN PROGRESS. This pipeline was built specifically for the Krabbenhoft Lab's servers and the University at Buffalo HPC cluster. We are in the process of revising this pipeline to work on any Linux system. We plan to distribute this pipeline with a Docker image containing all dependencies in the future. Please stay tuned for updates.

Authors: Dan MacGuigan*, Nate Backenstose, Christopher Osborne

*dmacguig@buffalo.edu

Annotation pipeline flowchart

flowchart

Dependencies

RepeatModeler
RepeatMasker
NCBI BLAST 2.4.0 (for compatibility with ProtExcluder)
ProtExcluder
perl
bbmap
bedtools
samtools
HISAT2
NCBI Datasets command line tools
BRAKER3
AGAT
GeMoMa
GenomeTools
EVidenceModeler
R
- ggplot2 package
- cowplot package
EggNOG-mapper + dependencies

Usage

First, clone this repository.

git clone https://github.com/KrabbenhoftLab/genome_annotation_pipeline.git

Next, rename the cloned repository from genome_annotation_pipeline to something informative. For example:

mv genome_annotation_pipeline MY_SPECIES_genome_annotation

This renamed directory is the ANNOTATION_DIR in your config file and will contain all of your data and results.

To see help options, run ./genome-annotation -h.

Before running the pipeline, be sure to set all of the variables in the config.txt file.

When starting a new genome annotation, your directory structure should look like this:

ANNOTATION_DIR_BOTTLEROCKET/CLUSTER
- GENOME_DIR
- GENOME_FILE
- RNA_DIR
- RNA-seq FASTQ files listed in RNA_FILES
- scripts directory (from this repository)
- genome-annotation executable (from this repository)
- config.txt (from this repository)

To perform a step of the pipeline, run ./genome-annotation -s 1 -c config.txt. Pipeline steps should be performed sequentially, except for steps 5 and 6, which can run simultaneously.

Want to rerun part (or all) of the pipeline with different data or settings? Simply copy the ANNOTATION_DIR, rename it, delete old results, and edit the config.txt file (making sure to update the ANNOTATION_DIR variables). Then rerun the pipeline within the new directory. This is the best way to avoid accidentally overwriting your previous annotation files.