Streit-lab/enhancer_annotation_and_motif_analysis is a bioinformatic analysis pipeline for identifying enhancers associated to genes of interest and screening for motif binding sites.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a portable, reproducible manner.
Conditionally unzip genome (--fasta) and GTF/GFF (--gtf or --gff) files
Index genome in order to retrieve chromosome lengths
Filter genes of interest (--gene_list) from GTF and filter gene biotype entries in GTF/GFF
Conditionally extend length of peaks (--peaks) by a given length (--extend_peaks)
Assign TSS to peaks:
a) Assign TSS to peaks if they fall within CTCF sites flanking the peak of interest:
b) Assign TSS to peaks if they fall within an x.kb window of the peak of interest
Retrieve filtered peak fasta sequences
Calculate background base frequencies for motif screening
Identify motif binding sites in peaks (fimo
)
Annotate peak-motif file with nearby genes
Install Nextflow
(>=22.10.3
)
Install any of Docker
, Singularity
(you can follow this tutorial).
Download the pipeline
nextflow pull Streit-lab/enhancer_annotation_and_motif_analysis
Test the pipeline on a minimal dataset with a single command:
nextflow run Streit-lab/enhancer_annotation_and_motif_analysis \
-r main \
-profile test,docker \
--outdir output
Start running your own analysis!
nextflow run Streit-lab/enhancer_annotation_and_motif_analysis \
-r main \
--fasta <FASTA_PATH_OR_URL> \
--gtf <GTF_PATH_OR_URL> \
--peaks_bed <PEAK_BED_FILE> \
-profile <docker/singularity/conda>
- The pipeline comes with config profiles called
docker
,singularity
andconda
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
.- If you are using
singularity
, please use thenf-core download
command to download images first, before running the pipeline. Setting theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.- If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs.
--fasta
Required. Path or URL to fasta file, can be gzipped.
--gtf
or --gff
Required. Path or URL to GTF or GFF file, can be gzipped.
--peaks_bed
Required. Path to peak file in BED format. First four columns must contain; chrom, start, end, peakid.
Example file
.
--gene_ids
Optional. List of gene ids present in GTF to screen for enhancers and motifs. One gene id per line.
Example file
. If --gene_ids is not specified, all gene_ids will be extracted from the GTF or GFF. Default = null.
--extend_peaks
Optional. Number of bases by which to extend peaks (up and downstream). Default = 0.
--enhancer_window
Optional. Distance from TSS in GTF or GFF within which enhancers are screened. Default = 50000.
--ctcf
Optional. BED file containing co-ordinates for CTCF peaks to use for annotating enhancers to genes. If this argument is specified, the pipeline will annotate enhancers using CTCF windows rather than using
--enhancer_window
. Default = null.
--motif_matrix
Optional. By default the pipeline will screen against all motifs in the JASPAR core vertebrate non-redundant database
--motif_matrix jaspar_core_vert_nonredundant_motifs
. The redundant database can also be selected using--motif_matrix jaspar_core_vert_redundant_motifs
. Alternatively, a path to matrix file inmeme
format can also be provided.Example file
.
--markov_background
Optional. Markov background model used to define base frequencies for motif screening. This is calculated by default from the provided
--fasta
input.
--fimo_pval
Optional. p-value threshold used by FIMO for motif screening. Default = 0.0001.
--gene_name_col
Optional. Entry in GTF or GFF corresponding to gene names. Default = 'gene_name'.
--gene_id_col
Optional. Entry in GTF or GFF corresponding to gene IDs. Default = 'gene_id'.
--skip_motif_analysis
Optional. Boolean parameter which determines whether to run motif analysis after annotating enhancers. Default = false.
--outdir
Optional. Directory to output results to. Default = 'results'.