RNAz 2.1

Content

Terms of use
Introduction
Theory 2.1. The structure conservation index 2.2. Thermodynamic stability 2.3. Classification based on both scores
Usage 3.1. Installation 3.2. Enviroment variable 3.3. Invocation 3.4. Output
Citing RNAz
Important notes
Contact
Terms of use

Please read the file COPYING for licence terms of RNAz.

Introduction

RNAz detects stable and conserved RNA secondary structures in multiple sequence alignments. RNAz calculates two independent scores for structural conservation (the structure conservation index SCI) and for thermodynamical stability (the z-score). High structural conservation (high SCI) and thermodynamical stability (negative z-scores) are typical features of functional RNAs (e.g. non-coding RNAs or cis-acting regulatory elements). RNAz uses both scores to classify a given alignment as functional RNA or not. It uses a support vector machine classification procedure which estimates a RNA-class probability which can be used as convenient overall-score.

For a detailed coverage of all aspects of RNAz we recommend to read the manual/tutorial (manual.pdf).

Theory

2.1. The structure conservation index

RNAz uses programs from the Vienna RNA package to perform minimum free energy (MFE) RNA secondary structure predictions.

First, it calculates the average MFEs for all single sequences in the aligment using RNAfold.

Second the complete alignment is folded using RNAalifold. RNAalifold implements a consensus folding algorithm which uses essentially the same algorithms and energy parameters as RNAfold. It calculates a consensus MFE which is composed of an energy term averaging the energy contributions of the single sequences and a covariance term rewarding compensatory and consistent mutations.

If the sequences in the alignment can fold into a common structure, the average MFE and the consensus MFE will be of similar dimension. If there is no common structure, the consensus MFE will be higher (i.e. less stable) than the average MFE of the single sequences.

Based on this intuitive rationale, a structure conservation index is defined:

                   consensus MFE
             SCI=  --------------
                    average MFE

The SCI will be around 0 if RNAalifold does not find a consensus structure, it will be around 1 if the structure is perfectly conserved. A SCI above 1 indicates a perfectly conserved secondary structure which is even supported by compensatory and/or consistent mutations.

2.2. Thermodynamic stability

The significance of a predicted MFE as calculated by RNAfold is difficult to interpret in absolute terms. It depends on the length and the base composition of the sequences (longer sequence => lower MFE, higher GC-content => lower MFE). Typically the significane of a MFE is estimated by comparing to many random sequences of the same length and base composition. If mu is the mean and sigma the standard deviation of the MFEs of many random sequences a convenient normalized measure for the significance of the native sequence with MFE m is a z-score:

                      m - mu
                z = ---------
                      sigma

RNAz can effeciently calculate z-scores without sampling. Negative z-scores indicate that the native sequence is more stable than the random background. The unit of z-scores are standard deviations. Random MFEs can be roughly approximated by a standard normal distribution which gives an impression of associated P values (e.g. z=-2 => P=0.98).

2.3. Classification based on both scores

Both scores represent independent diagnostic features of functional RNAs. RNAz combines both into a overall score using a support vector machine regression. Depending on the SCI and z, but also the number of sequences in the alignment and the mean pairwise identity a RNA class-probability is calculated. It should be noted that the level of the SCI depends on the sequence similarity. 100% conservation for example results in a SCI of 1 per definition but do not hold any information for our purpose. Thus, the support vector machine was taught to interpret the significance of the SCI depending on the sequence variation.

The confidence level of this RNA-class probability or "RNAz P-value" slightly varies depending on the properties of the input alignment. In our tests, P=0.5 and P=0.9 had specificities of 96% and 99%.

Usage

3.1. Installation

See INSTALL for details.

3.3. Invocation

RNAz [options] [filename]

RNAz takes an alignment file in the ClustalW or MAF format. Available command line options:

-f, --forward Score forward strand -r, --reverse Score reverse strand -b, --both-strands Score both strands -o, --outfile=FILENAME Output filename -p, --cutoff=FLOAT Probability cutoff -d, --dinucleotide Use dinucleotide shuffled z-scores (default) -m, --mononucleotide Use mononucleotide shuffled z-scores (default) -l, --locarnate Use decision model for structural alignments (default=off) -n, --no-shuffle Never fall back to shuffling (default=off) -h, --help Print help screen -V, --version Show version information

You can test RNAz on one of the example alignments (installed by default in /usr/local/share/RNAz/examples)

cd /usr/local/share/RNAz/examples

RNAz tRNA.aln

3.4. Output

Please refer to the following commented sample output:

Header:

############################ RNAz 2.1 ##############################

Sequences: 4 ... Number of sequences in the alignment Columns: 73 ... Number of columns of the alignment Reading direction: forward ... Strand considered for calculation ("forward" or "reverse") Mean pairwise identity: 80.82 ... Mean pairwise sequence identity in % Shannon entropy: 0.31118 ... Metric for sequence diversity taking into accout also number of sequences G+C content: 0.54795 ... GC-content Mean single sequence MFE: -27.20 ... Average mean pairwise identity of Consensus MFE: -26.50 ... consensus MFE calculated by RNAalifold Energy contribution: -23.63 ... RNAalifold energy part Covariance contribution: -2.87 ... RNAalifold covariance part Combinations/Pair: 1.43 ... Number of different base pair combination per predicted consensus base-pair Mean z-score: -1.82 Structure conservation index: 0.97 Background model: dinucleotide ... Type of background model (di- or mononucleotide) used to calculate z-score Decision model: sequence based alignment quality ... see --locarnate option SVM decision value: 2.15 ... Internal decision value, probably not of much interest... SVM RNA-class probability: 0.984068 <- Probability estimate for the classification, "RNAz P-value" Prediction: RNA ... "RNA" if P>=0.5 else "OTHER"

######################################################################

sacCer1 GCCUUGUUGGCGCAAUCGGUAGCGCGUAUGACUCUUAAUCAUAAGGUUAGGGGUUCGAGCCCCCUACAGGGCU (((((((.(((((........))))...((((.((((....))))))))(((((....)))))).))))))). ( -29.20, z-score = -2.35, R) sacBay GCCUUGUUGGCGCAAUCGGUAGCGCGUAUGACUCUUAAUCAUAAGGUUAGGGGUUCGAGCCCCCUACAGGGCU (((((((.(((((........))))...((((.((((....))))))))(((((....)))))).))))))). ( -29.20, z-score = -2.35, R) sacKlu GCCUUGUUGGCGCAAUCGGUAGCGCGUAUGACUCUUAAUCAUAAGGCUAGGGGUUCGAGCCCCCUACAGGGCU (((((((.(((((........)))).(((((.......)))))......(((((....)))))).))))))). ( -27.20, z-score = -1.34, R) sacCas GCUUCAGUAGCUCAGUCGGAAGAGCGUCAGUCUCAUAAUCUGAAGGUCGAGAGUUCGAACCUCCCCUGGAGCA (((((((..((((........)))).((((.........))))((((((......)).))))...))))))). ( -23.20, z-score = -1.22, R) consensus GCCUUGUUGGCGCAAUCGGUAGCGCGUAUGACUCUUAAUCAUAAGGUUAGGGGUUCGAGCCCCCUACAGGGCU (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))). (-26.50 = -23.63 + -2.87)

Secondary structure predictions:

The next lines summarize the secondary structure predictions in the following format:

sequence name SEQUENCE STRUCTURE (MFE, Z-SCORE, R|S)

The structure is indicated in dot bracket notation: '.' denotes a unpaired base, while '(' and ')' denote base pairs. Energies are given in kcal/Mol. "R" means z-score was calculated by regression. "S" means z-score was estimated by shuffling.

The last structure is the RNAalifold consensus structure with the consensus MFE (broken down in energy and covariance contribution)

Advanced usage

See the Manual (manual.pdf) for more detailed documentation.

Citing RNAz

If you use RNAz in your work please cite:

Gruber AR, Findeiss, Washietl S, Hofacker IL, and Stadler PF. RNAz 2.0: Improved noncoding rna detection. Pac Symp Biocomput, 2010. 15:69–79.

Washietl S, Hofacker IL, Stadler PF Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A. 102(7):2454-9 (2005)

Contact

Stefan Washietl wash@tbi.univie.ac.at

ViennaRNA / RNAz

readme

RNAz 2.1

Content

Terms of use

Introduction

Theory

2.1. The structure conservation index

2.2. Thermodynamic stability

2.3. Classification based on both scores

Usage

3.1. Installation

3.3. Invocation

3.4. Output

Advanced usage

Citing RNAz

Contact