Matteo-Ciciani / PAMpredict

A python package to predict CRISPR-Cas PAMs
Other
6 stars 1 forks source link

CC BY-NC-ND 4.0

PAMpredict

A package to predict protospacer adjactent motifs (PAMs) of Cas proteins.

Installation

conda create -n PAMpredict -c conda-forge -c bioconda python=3 argparse biopython pandas numpy scipy logomaker blast pysam mafft muscle=5.1 samtools matplotlib sed
git clone https://github.com/Matteo-Ciciani/PAMpredict
conda activate PAMpredict
cd PAMpredict
PAMpredict/PAMpredict.py -h

Phage Databases

PAMpredict can run on any database of phage genomes in fasta format. We recommend using GPD and MGV. To download them:

# Gut Phage Databases (GPD)
wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/genome_sets/gut_phage_database/GPD_sequences.fa.gz
gunzip GPD_sequences.fa.gz
# Metagenomic Gut Virus catalogue (MGV)
wget https://portal.nersc.gov/MGV/MGV_v1.0_2021_07_08/mgv_contigs.fna

Usage

PAMpredict is desigend to search for conserved nucleotides near putative protospacers, using a list of CRISPR spacers as input. The input spacers must be in fasta format and have to be in the same orientation (i.e. they come from the same CRISPR array or from arrays in the same orientation). A blastn database must be built for each phage genome database prior to running the analysis (see Example run).

Example run

The example provided shows how to predict the PAM of SpCas9. First, build the blastn database:

# run this in the PAMpredict directory
conda activate PAMpredict
makeblastdb -in Example/Phages/phages.fna -dbtype nucl -parse_seqids

Then run the prediction with:

PAMpredict/PAMpredict.py Example/spacers.fna Example/Phages Example/outdir

Resulting in a PAM identified downstream of the putative protospacers.

SpCas9_PAM_prediction

Parameter list

-t, --threads: Number of parallel processes [default:1].

--keep_tmp: Keep temporary files.

--log_lvl: Logging level (DEBUG,INFO,WARNING,ERROR) [default:INFO].

--force: Overwrites existing results if present.

-d, --max_diff: Maximum number of differences (gaps + mismatches) allowed between spacers and putative protospacers [default:4].

-p, --pam_position: PAM position with repsect to spacers, default is DOWNSTREAM (e.g. for Cas9), can be changed to UPSTREAM (e.g. for Cas12).

-f, --format: File format of the PAM plot (png,ps,eps,svg,pdf) [default:png].

-l, --pam_length: Number of PAM positions used to generate predictions and plot [default:10].

--no_plot: Suppress plot generation.

Output

If run with --keep_tmp the following is also produced: