Phylogenetic target prediction for prokaryotic trans-acting small RNAs
CopraRNA is a tool for sRNA target prediction. It computes whole genome target predictions by combination of distinct whole genome IntaRNA predictions. As input CopraRNA requires at least 3 homologous sRNA sequences from 3 distinct organisms in FASTA format. Furthermore, each organisms' genome has to be part of the NCBI Reference Sequence (RefSeq) database (i.e. it should have exactly this NZ_ or this NC_XXXXXX format where stands for any character and X stands for a digit between 0 and 9). Depending on sequence length (target and sRNA), amount of input organisms and genome sizes, CopraRNA can take up to 24h or longer to compute. In most cases it is significantly faster. It is suggested to run CopraRNA on a machine with at least 8 GB of memory.
CopraRNA produces a lot of file I/O. It is suggested to run CopraRNA in a dedicated empty directory to avoid unexpected behavior.
For testing or ad hoc use of CopraRNA, you can use its webinterface at the
==> Freiburg RNA tools CopraRNA webserver <==
If you use CopraRNA, please cite our articles
The following topics are covered by this documentation:
In order to use CopraRNA you can either install it directly via conda or clone this github repository and install the dependencies individually. It is also possible to run CopraRNA via a provided Docker container.
The following setup was successfully used to build and run CopraRNA via conda:
name: CopraRNA-2.1.3
channels:
- conda-forge
- bioconda
- defaults
- r
- conda
dependencies:
- blast-legacy
- bzip2
- clustalo
- coreutils
- domclust
- embassy-phylip
- emboss
- gawk
- grep
- intarna >2.2
- mafft
- perl <6
- perl-bioperl
- perl-bio-eutilities
- perl-getopt-long
- perl-list-moreutils
- perl-parallel-forkmanager
- phantomjs
- python
- r-base <4
- r-pheatmap
- r-robustrankaggreg
- r-seqinr
- sed
- suds-jurko
The following package versions were tested and functional during development of CopraRNA2.
bzip2 1.0.6 (for the core genome archive) // conda install bzip2
gawk 4.1.3 // conda install gawk
sed 4.2.2.165-6e76-dirty // conda install sed
grep 2.14 // conda install grep
GNU coreutils 8.25 // conda install coreutils
IntaRNA 2.1.0 // conda install intarna
EMBOSS package 6.5.7 - distmat (creates distance matix from msa) // conda install emboss
embassy-phylip 3.69.650 - fneighbor (creates tree from dist matrix) // conda install embassy-phylip
ncbiblast-2.2.22 // conda install blast-legacy
DomClust 1.2.8a // conda install domclust
MAFFT 7.310 // conda install mafft
clustalo 1.2.3 // conda install clustalo
phantomjs 2.1.1-0 // conda install phantomjs
Perl (5.22.0) Module(s): // conda install perl
R statistics 3.2.2 // conda install r-base==3.2.2
python // conda install python
The most easy way to locally install CopraRNA is via conda using the bioconda channel (linux only). This way, you will install CopraRNA along with all dependencies. Follow to get detailed information. We recommend installing into a dedicated environment, to avoid conflicts with other installed tools. Following two commands install CopraRNA into the enviroment and activate it:
conda create -n coprarnaenv -c bioconda -c conda-forge coprarna
source activate coprarnaenv
CopraRNA can be retrieved and used as docker container with all dependencies via docker. Once you have docker installed simply type (with changed version):
docker run -i -t quay.io/biocontainers/coprarna:2.1.0--0 /bin/bash
git clone https://github.com/PatrickRWright/CopraRNA
If you installed all dependencies you should be able to directly use the source.
Example call:
CopraRNA2.pl -srnaseq sRNAs.fa -ntup 200 -ntdown 100 -region 5utr -enrich 200 -topcount 200 -cores 4
The following options are available:
--help
: help--srnaseq
: FASTA file with small RNA sequences (def:input_sRNA.fa)--region
: region to scan in whole genome target prediction (def:5utr)
--ntup
: amount of nucleotides upstream of '--region' to parse for targeting (def:200)--ntdown
: amount of nucleotides downstream of '--region' to parse for targeting (def:100)--cores
: amount of cores to use for parallel computation (def:1)--rcsize
: minimum amount (%) of putative target homologs that need to be available for a target cluster
to be considered in the CopraRNA1 part (see --cop1) of the prediction (def:0.5)--winsize
IntaRNA target (--tAccW) window size parameter (def:150)--maxbpdist
IntaRNA target (--tAccL) maximum base pair distance parameter (def:100)--cop1
switch for CopraRNA1 prediction (def:off)--cons
controls consensus prediction (def:0)
--verbose
: switch to print verbose output to terminal during computation (def:off)--websrv
: switch to provide webserver output files (def:off)--noclean
: switch to prevent removal of temporary files (def:off)--enrich
: if entered then DAVID-WS functional enrichment is calculated with given amount of top predictions (def:off)--nooi
: if set then the CopraRNA2 prediction mode is set not to focus on the organism of interest (def:off)--ooifilt
: post processing filter for organism of interest p-value 0=off (def:0)--root
: specifies root function to apply to the weights (def:1)--topcount
: specifies the amount of top predictions to return and use for the extended regions plots (def:200)In the update_kegg2refseq directory you create a new run directory
mkdir run
and change into this directory
cd run
Here you can execute build_kegg2refseq.pl
../build_kegg2refseq.pl
which will download prokaryotes.txt from the NCBI and process it into the files CopraRNA_available_organisms.txt and kegg2refseqnew.csv. These two files must then be copied into coprarna_aux where they override their older versions.