KennthShang / CHERRY_crispr_MAG

Predict phage-bacteria interaction using CRISPRs
1 stars 0 forks source link

CHERRY

CHERRY-crispr MAG version

This program provides an extension version of CHERRY, which uses the CRISPR information captured from your provided MGAs for host prediction.

The main local program is available via PhaBOX and WebServer

Table of Contents

๐Ÿš€  Installation

If you have already installed phabox before, you can skip this part and directly use the phabox environment

We suggest you install all the packages using conda (both Miniconda and Anaconda are ok) following the command lines below:

conda create --name cherry_crispr_db python=3.8
conda activate cherry_crispr_db
conda install pandas numpy biopython
conda install blast -c bioconda

๐Ÿš€  Quick Start

Remember to conda activate your env first

git clone https://github.com/KennthShang/CHERRY_crispr_DATABASE.git

# If one MAG is in one fasta file and all the MAGs are located in ~/bacteria/
python CHERRY_crispr_MAG/cherry_crispr_mag.py --bfolder ~/bacteria/ --pfile ~/phage.fa --threads 40 --rootpth ~/test_dir --dbdir CHERRY_crispr_MAG/database --ident 90 --coverage 0.9

OR

# If all bacteria sequences in one fasta file named ~/bacteria.fa
python CHERRY_crispr_MAG/cherry_crispr_mag.py --bfile ~/bacteria.fa --pfile ~/phage.fa --threads 40 --rootpth ~/test_dir --dbdir CHERRY_crispr_MAG/database --ident 90 --coverage 0.9

โŒ›๏ธ  Usage

  Choose one of the mode below:
  --bfile 
                        If your bacteria contigs are in one fasta file
  --bfolder
                        If your bacteria MAGs are in a folder

  Common options:
  --pfile
                        path to your phage contigs (FASTA file)
  --rootpth 
                        path to the output folder
  --dbdir 
                        path to the CHERRY_crispr_MAG/database
  --threads 
                        Number of threads to run the program (default 8)
  --ident
                        Identity threshold for the alignments (default 90)
  --coverage
                        Coverage threshold for the alignments (default 0.9)

The program will return the results that meet both ident & coverage thresholds.

๐Ÿ“ˆ  Output format


Input (provided by user):
    1. bacterial contigs from their samples (FASTA files)
    2. phage contigs from their samples (FASTA files)

Output:
    1. CRISPRs.fa: CRISPRs found in your provided bacteria
    2. crispr_align.txt: BLASTN results between CRISPR and phage
    3. cherry_crispr_pred.csv: CSV files of the prediction
       [In the program --ident refer to pident and --coverage refer to length/slen]

๐Ÿ“ซ  Have a question?

We are happy to hear your question on our issues page CHERRY! Obviously, if you have a private question or want to cooperate with us, you can always reach out to us directly via our email: jiayushang@cuhk.edu.hk

โœ๏ธ  Citation

If you use this program, please cite the following papers:

๐Ÿคต  Team

Our groupmates also provide many useful tools for bioinformatics analysis. Please check Yanni's Group for further information. Hope you will like them!