NEFFy: NEFF Calculator and MSA File Converter

NEFFy is a versatile and efficient tool for bioinformatics research, offering advanced features for calculating NEFF (Normalized Effective Number of Sequences) for Multiple Sequence Alignments (MSA)s of any biological sequences, including protein, RNA, and DNA across various MSA formats.
Additionally, NEFFy includes built-in support for format conversion, allowing users to seamlessly convert between different MSA formats.

C++ Executable
Python Library
- Library Installation
- Library Usage
Supported File Formats
Error Handling
License

C++ Executable

Installation

To install the NEFFy tool, clone the repository and compile the code using a C++ compiler that supports C++17 or a newer version. You can use the provided Makefile in the repository for this purpose. Navigate to the repository directory and enter the following command in the terminal:

make

If the make command is not available on your operating system, here is how you can install it.

Once the compilation is complete, you can run the program via the command line.
This package is cross-platform and works on Linux, Windows, and macOS without requiring additional compilation.

For more information on installing the executable, please refer to the documentation.

Project Outline

The NEFFy repository is structured as follows:

outline

Usage

1. NEFF Computation

NEFF determines the effective number of homologous sequences within a Multiple Sequence Alignment (MSA). It accounts for sequence similarities and provides a measure of sequence diversity.
To calculate NEFF, use the neff script by providing one or more MSA files and specifying the appropriate flags for NEFF computation. If multiple files are provided, NEFFy will combine them and compute NEFF for the integrated version.

Flags:

The code accepts the following command-line flags:	Flag	Description	Required	Default Value
`--file=<list of filenames>`	Input files (comma-separated, no spaces) containing multiple sequence alignments	Yes	N/A	`--file=example.fasta`
`--alphabet=<value>`	Alphabet of MSA 0: Protein 1: RNA 2: DNA	No	0	`--alphabet=1`
`--check_validation=[true/false]`	Validate the input MSA file based on alphabet or not	No	false	`--check_validation=true`
`--threshold=<value>`	Threshold value of considering two sequences similar (between 0 and 1)	No	0.8	`--threshold=0.7`
`--norm=<value>`	Normalization option for NEFF 0: Normalize by the square root of sequence length 1: Normalize by the sequence length 2: No Normalization	No	0	`--norm=2`
`--omit_query_gaps=[true/false]`	Omit gap positions of query sequence from entire sequences for NEFF computation	No	true	`--omit_query_gaps=true`
`--is_symmetric =true/false]`	Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric)	No	true	`--is_symmetric=false`
`--non_standard_option=<value>`	Options for handling non-standard letters of the specified alphabet 0: Treat them the same as standard letters 1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version) 2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch	No	0	`--non_standard_option=1`
`--depth=<value>`	Depth of MSA to be considered in computation (starting from the first sequence)	No	inf (consider all sequences)	`--depth=10` (if given value is greater than original depth, it considers the original depth)
`--gap_cutoff=<value>`	Threshold for considering a position as gappy and removing that (between 0 and 1)	No	1 (no gappy position)	`--gap_cutoff=0.7`
`--pos_start=<value>`	Start position of each sequence to be considered in NEFF (inclusive)	No	1 (the first position)	`--pos_start=10`
`--pos_end=<value>`	Last position of each sequence to be considered in NEFF (inclusive)	No	inf (consider all sequence)	`--pos_end=50` (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA)
`--only_weights=[true/false]`	Return only sequence weights, rather than the final NEFF	No	false	`--only_weights=true`
`--multimer_MSA=[true/false]`	Compute NEFF for MSA of a multimer	No	false	`--multimer_MSA=true`
`--stoichiom=<value>`	Stochiometry of the multimer	when _multimerMSA=true		`--stoichiom=A2B1`
`--chain_length=<list of values>`	Length of the chains in a heteromer	when _multimerMSA=true and multimer is a heteromer	0	`--chain_length=17 45`
`--residue_neff=[true/false]`	Compute per-residue (column-wise) NEFF	No	false	`--residue_neff=true`

For more details about features, please refer to the documentation.

Example:

neff --file=./MSAs/example.a2m --threshold=0.6 --norm=2 --is_symmetric=false --check_validation=true

As output, it will print the final MSA length, depth and Neff to the console, based on the given options.

For more examples on using NEFFy for NEFF calculations with various options and features, please refer to the documentation usage guide.

2. MSA File Conversion

The MSA file conversion allows you to convert MSA files between different supported formats.
All you need is to use the converter program and specify the input and output files with their formats, and the tool will perform the conversion accordingly.

Flags:

The code accepts the following command-line flags:	Flag	Description	Required	Default Value
`--in_file=<filename>`	Specifies the inputf MSA file to be converted. Replace `<filename>` with the path and name of the input file	Yes	N/A	`--in_file=input.fasta`
`--out_file=<filename>`	Specifies the output file where the converted MSA will be saved. Replace `<filename>` with the desired path and name of the output file	Yes	N/A	`--out_file=output.a2m`
`--alphabet=<value>`	Alphabet of MSA 0: Protein 1: RNA 2: DNA	No	0	`--alphabet=1`
`--check_validation=[true/false]`	Validate the input MSA file based on alphabet or not	No	true	`--check_validation=true`

Please note that the conversion is performed based on the specified input and output file extensions.
For more details about features, please refer to the documentation.

Example:

Suppose you have an MSA file named "input.fasta" and you want to convert to the A2M format and save it as "output.a2m".

converter --in_file=./MSAs/example.a2m --out_file=./MSAs/example.sto

For more examples on using NEFFy for MSA conversion, please refer to the documentation usage guide.

Python Library

Neffy also provides a python library as an interface of the executable files.

Library Installation

From Source

To install the library from the source:

Clone the repository:

git clone https://github.com/Maryam-Haghani/Neffy.git

Navigate to the project directory:
```
cd Neffy
```
Ensure you have setuptools and wheel installed:
```
pip install setuptools wheel
```
Build the source distribution and wheel:
```
python setup.py sdist bdist_wheel
```
Install the package from the root directory of the project:
```
pip install .
```
Alternatively, you can install the package directly from the built wheel file (in the dist directory):
```
pip install dist/neffy-0.1-py3-none-any.whl
```

From PyPI (will be distributed at a later date):

To install the package from PyPI:

pip install neffy

Library Usage

An example of neff computation:

cd example
python compute_neff.py

You can find more examples of using the Python library's various methods for NEFF calculations in the examples directory. For method parameters and detailed explanations, please refer to the documentation usage guide.

An example of MSA conversion:

cd example
python convert_msa.py

Additional examples of using NEFFy for MSA conversion can be found in the example directory. For further detailed explanations, please refer to the documentation usage guide.

Supported File Formats

A2M (aligned FASTA-like format)
A3M (compressed aligned FASTA-like format with lowercase letters for insertions)
FASTA, AFA, FAS, FST, FSA (FASTA format)
STO (Stockholm format)
CLUSTAL (CLUSTAL format)
ALN (ALN format)
PFAM (format mostly used for nucleotides)

In the documentation, you will find a brief explanation of each format, along with an illustrative alignment example for each one.

Error Handling

If any errors occur during the execution of the MSA Processor, an error message will be displayed, describing the issue encountered.
Please refer to the error message for troubleshooting or make necessary corrections to the input.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

For further assistance, please see the documentation.

Maryam-Haghani / NEFFy

readme

NEFFy: NEFF Calculator and MSA File Converter

Table of Contents

C++ Executable

Installation

Project Outline

Usage

1. NEFF Computation

Flags:

Example:

2. MSA File Conversion

Flags:

Example:

Python Library

Library Installation

From Source

From PyPI (will be distributed at a later date):

Library Usage

An example of neff computation:

An example of MSA conversion:

Supported File Formats

Error Handling

License