Maryam-Haghani / NEFFy

NEFF Calculator and MSA File Converter
https://maryam-haghani.github.io/NEFFy/
GNU General Public License v3.0
3 stars 0 forks source link
conversion-tool effective-sequences msa multiple-sequence-alignment neff

NEFFy: NEFF Calculator and MSA File Converter

NEFFy is a versatile and efficient tool for bioinformatics research, offering advanced features for calculating NEFF (Normalized Effective Number of Sequences) for Multiple Sequence Alignments (MSA)s of any biological sequences, including protein, RNA, and DNA across various MSA formats.
Additionally, NEFFy includes built-in support for format conversion, allowing users to seamlessly convert between different MSA formats.

Table of Contents

C++ Executable

Installation

To install the NEFFy tool, clone the repository and compile the code using a C++ compiler that supports C++17 or a newer version. You can use the provided Makefile in the repository for this purpose. Navigate to the repository directory and enter the following command in the terminal:

make

If the make command is not available on your operating system, here is how you can install it.

Once the compilation is complete, you can run the program via the command line.
This package is cross-platform and works on Linux, Windows, and macOS without requiring additional compilation.

For more information on installing the executable, please refer to the documentation.

Project Outline

The NEFFy repository is structured as follows:

outline

Usage

1. NEFF Computation

NEFF determines the effective number of homologous sequences within a Multiple Sequence Alignment (MSA). It accounts for sequence similarities and provides a measure of sequence diversity.
To calculate NEFF, use the neff script by providing one or more MSA files and specifying the appropriate flags for NEFF computation. If multiple files are provided, NEFFy will combine them and compute NEFF for the integrated version.

Flags:

The code accepts the following command-line flags: Flag Description Required Default Value Example
--file=<list of filenames> Input files (comma-separated, no spaces) containing multiple sequence alignments Yes N/A --file=example.fasta
--alphabet=<value> Alphabet of MSA
0: Protein
1: RNA
2: DNA
No 0 --alphabet=1
--check_validation=[true/false] Validate the input MSA file based on alphabet or not No false --check_validation=true
--threshold=<value> Threshold value of considering two sequences similar (between 0 and 1) No 0.8 --threshold=0.7
--norm=<value> Normalization option for NEFF
0: Normalize by the square root of sequence length
1: Normalize by the sequence length
2: No Normalization
No 0 --norm=2
--omit_query_gaps=[true/false] Omit gap positions of query sequence from entire sequences for NEFF computation No true --omit_query_gaps=true
--is_symmetric =true/false] Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) No true --is_symmetric=false
--non_standard_option=<value> Options for handling non-standard letters of the specified alphabet
0: Treat them the same as standard letters
1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version)
2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch
No 0 --non_standard_option=1
--depth=<value> Depth of MSA to be considered in computation (starting from the first sequence) No inf (consider all sequences) --depth=10
(if given value is greater than original depth, it considers the original depth)
--gap_cutoff=<value> Threshold for considering a position as gappy and removing that (between 0 and 1) No 1 (no gappy position) --gap_cutoff=0.7
--pos_start=<value> Start position of each sequence to be considered in NEFF (inclusive) No 1 (the first position) --pos_start=10
--pos_end=<value> Last position of each sequence to be considered in NEFF (inclusive) No inf (consider all sequence) --pos_end=50 (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA)
--only_weights=[true/false] Return only sequence weights, rather than the final NEFF No false --only_weights=true
--multimer_MSA=[true/false] Compute NEFF for MSA of a multimer No false --multimer_MSA=true
--stoichiom=<value> Stochiometry of the multimer when _multimerMSA=true --stoichiom=A2B1
--chain_length=<list of values> Length of the chains in a heteromer when _multimerMSA=true and multimer is a heteromer 0 --chain_length=17 45
--residue_neff=[true/false] Compute per-residue (column-wise) NEFF No false --residue_neff=true

For more details about features, please refer to the documentation.

Example:

neff --file=./MSAs/example.a2m --threshold=0.6 --norm=2 --is_symmetric=false --check_validation=true

As output, it will print the final MSA length, depth and Neff to the console, based on the given options.

For more examples on using NEFFy for NEFF calculations with various options and features, please refer to the documentation usage guide.

2. MSA File Conversion

The MSA file conversion allows you to convert MSA files between different supported formats.
All you need is to use the converter program and specify the input and output files with their formats, and the tool will perform the conversion accordingly.

Flags:

The code accepts the following command-line flags: Flag Description Required Default Value Example
--in_file=<filename> Specifies the inputf MSA file to be converted.
Replace <filename> with the path and name of the input file
Yes N/A --in_file=input.fasta
--out_file=<filename> Specifies the output file where the converted MSA will be saved.
Replace <filename> with the desired path and name of the output file
Yes N/A --out_file=output.a2m
--alphabet=<value> Alphabet of MSA
0: Protein
1: RNA
2: DNA
No 0 --alphabet=1
--check_validation=[true/false] Validate the input MSA file based on alphabet or not No true --check_validation=true

Please note that the conversion is performed based on the specified input and output file extensions.
For more details about features, please refer to the documentation.

Example:

Suppose you have an MSA file named "input.fasta" and you want to convert to the A2M format and save it as "output.a2m".

converter --in_file=./MSAs/example.a2m --out_file=./MSAs/example.sto

For more examples on using NEFFy for MSA conversion, please refer to the documentation usage guide.


Python Library

Neffy also provides a python library as an interface of the executable files.

Library Installation

From Source

To install the library from the source:

  1. Clone the repository:
    git clone https://github.com/Maryam-Haghani/Neffy.git
  2. Navigate to the project directory:
    cd Neffy
  3. Ensure you have setuptools and wheel installed:
    pip install setuptools wheel
  4. Build the source distribution and wheel:
    python setup.py sdist bdist_wheel
  5. Install the package from the root directory of the project:

    pip install .

    Alternatively, you can install the package directly from the built wheel file (in the dist directory):

    pip install dist/neffy-0.1-py3-none-any.whl

From PyPI (will be distributed at a later date):

To install the package from PyPI:

pip install neffy

Library Usage

An example of neff computation:

cd example
python compute_neff.py

You can find more examples of using the Python library's various methods for NEFF calculations in the examples directory. For method parameters and detailed explanations, please refer to the documentation usage guide.

An example of MSA conversion:

cd example
python convert_msa.py

Additional examples of using NEFFy for MSA conversion can be found in the example directory. For further detailed explanations, please refer to the documentation usage guide.


Supported File Formats

In the documentation, you will find a brief explanation of each format, along with an illustrative alignment example for each one.


Error Handling

If any errors occur during the execution of the MSA Processor, an error message will be displayed, describing the issue encountered.
Please refer to the error message for troubleshooting or make necessary corrections to the input.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.


For further assistance, please see the documentation.