KIT-MBS / coconet

RNA Contact Prediction Using Coevolution and Convolutional Neural Network
MIT License
7 stars 3 forks source link

Introduction

coconet or CoCoNet is a short name for RNA contact prediction using Coevolution and Convolutional Neural Network. It combines state-of-the-art DCA algorithms and a shallow convolutional neural network co enhance RNA contact prediction from multiple sequence alignments of homologous RNAs. It is implemented in Python and requires Python version 3.5 or later versions.

Dependencies

coconet uses pydca to perform computations on the coevolutionary layer. You need to install the most recent version (i.e., version 1.23 ) of pydca. By default the command

pip install pydca

installs the required version.

Usage

The package can be manually downloaded or cloned using the command

git clone  https://github.com/KIT-MBS/coconet

Computing weighted scores

Once coconet is downloaded change to the directory containing file setup.py and execute on the command line

python -m coconet.main <msa_file> --verbose 

where <msa_file> denotes FASTA formatted multiple sequence alignment (MSA) file of an RNA. Note that the first sequence in the MSA file should be the target/reference sequence. The optional argument --verbose allows logging messages printed on the screen.

By default coconet uses a single 3x3 matrix. However, its possible to specify the matrix size on the command line using the optional argument msize as follows.

python -m coconet.main <msa_file> --msize 5 --verbose 

The allowed values of msize are 3, 5, and 7.

In addition, coconet can use two matrices: one for Watson-Crick nucleotide pairs and the other for non-Watson-Crick ones. This can be achieved using the optional argument --wc_and_nwc. For example.

python -m coconet.main <msa_file>  --msize 7 --wc_and_nwc --verbose

The above command executes coconet using two 7x7 matrices.

In addition, convolution can be performed on top of plmDCA. To enable this feature, use the --on_plm optional argument. Example:

python -m coconet.main <msa_file>  --on_plm --num_threads 2 --max_iterations 5000 --verbose

The optional arguments --num_threads and --max_iterations control the numbers of threads used (if OpenMP is supported) and gradient decent iterations, respectively.

Finally, help messages can be prited out on the screen when the command

python -m coconet.main

is executed, i.e., by running the coconet.main module without any additional input from the command line.

Training coconet

Also, the network can be trained on the dataset using a five-fold cross validation procedure. For example, the command

python -m coconet.train run  --msize 5 --verbose 

trains the network using a 5x5 weight matrix using mean-field DCA as a coevolutionary layer. If plmDCA is desired, the --on_plm optional argument can be provided, for instance as

python -m coconet.train run --msize 7 --on_plm --num_threads 4 --verbose

To see the available arguments to train the network, run the command

python -m coconet.train

Precomputed co-evolutionary data

Also, a precomputed co-evolutionary data for the RNA dataset and testset using CoCoNet and DCA-based algorithms is available in the directory RAW_COEV_DATA_ALL. The average positive predictive values (PPV) from this data, e.g., for the RNA dataset CoCoNet cross-validation and DCA-based methods, can be computed using

python -m coconet.ppv compute --verbose 

This command computes average PPV at rank L (length of RNAs sequence). More information about computing PPV from raw co-evolutionary data can be obtained by running the help command as

python -m coconet.ppv  --help