A package to predict protospacer adjactent motifs (PAMs) of Cas proteins.
conda create -n PAMpredict -c conda-forge -c bioconda python=3 argparse biopython pandas numpy scipy logomaker blast pysam mafft muscle=5.1 samtools matplotlib sed
git clone https://github.com/Matteo-Ciciani/PAMpredict
conda activate PAMpredict
cd PAMpredict
PAMpredict/PAMpredict.py -h
PAMpredict can run on any database of phage genomes in fasta format. We recommend using GPD and MGV. To download them:
# Gut Phage Databases (GPD)
wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/genome_sets/gut_phage_database/GPD_sequences.fa.gz
gunzip GPD_sequences.fa.gz
# Metagenomic Gut Virus catalogue (MGV)
wget https://portal.nersc.gov/MGV/MGV_v1.0_2021_07_08/mgv_contigs.fna
PAMpredict is desigend to search for conserved nucleotides near putative protospacers, using a list of CRISPR spacers as input. The input spacers must be in fasta format and have to be in the same orientation (i.e. they come from the same CRISPR array or from arrays in the same orientation). A blastn database must be built for each phage genome database prior to running the analysis (see Example run).
The example provided shows how to predict the PAM of SpCas9. First, build the blastn database:
# run this in the PAMpredict directory
conda activate PAMpredict
makeblastdb -in Example/Phages/phages.fna -dbtype nucl -parse_seqids
Then run the prediction with:
PAMpredict/PAMpredict.py Example/spacers.fna Example/Phages Example/outdir
Resulting in a PAM identified downstream of the putative protospacers.
-t, --threads: Number of parallel processes [default:1].
--keep_tmp: Keep temporary files.
--log_lvl: Logging level (DEBUG,INFO,WARNING,ERROR) [default:INFO].
--force: Overwrites existing results if present.
-d, --max_diff: Maximum number of differences (gaps + mismatches) allowed between spacers and putative protospacers [default:4].
-p, --pam_position: PAM position with repsect to spacers, default is DOWNSTREAM (e.g. for Cas9), can be changed to UPSTREAM (e.g. for Cas12).
-f, --format: File format of the PAM plot (png,ps,eps,svg,pdf) [default:png].
-l, --pam_length: Number of PAM positions used to generate predictions and plot [default:10].
--no_plot: Suppress plot generation.
--keep_tmp
the following is also produced: