NEFFy is a versatile and efficient tool for bioinformatics research, offering advanced features for calculating NEFF (Normalized Effective Number of Sequences) for Multiple Sequence Alignments (MSA)s of any biological sequences, including protein, RNA, and DNA across various MSA formats.
Additionally, NEFFy includes built-in support for format conversion, allowing users to seamlessly convert between different MSA formats.
To install the NEFFy tool, clone the repository and compile the code using a C++ compiler that supports C++17 or a newer version. You can use the provided Makefile in the repository for this purpose. Navigate to the repository directory and enter the following command in the terminal:
make
If the make
command is not available on your operating system, here is how you can install it.
Once the compilation is complete, you can run the program via the command line.
This package is cross-platform and works on Linux, Windows, and macOS without requiring additional compilation.
For more information on installing the executable, please refer to the documentation.
The NEFFy repository is structured as follows:
NEFF determines the effective number of homologous sequences within a Multiple Sequence Alignment (MSA). It accounts for sequence similarities and provides a measure of sequence diversity.
To calculate NEFF, use the neff script by providing one or more MSA files and specifying the appropriate flags for NEFF computation. If multiple files are provided, NEFFy will combine them and compute NEFF for the integrated version.
The code accepts the following command-line flags: | Flag | Description | Required | Default Value | Example |
---|---|---|---|---|---|
--file=<list of filenames> |
Input files (comma-separated, no spaces) containing multiple sequence alignments | Yes | N/A | --file=example.fasta |
|
--alphabet=<value> |
Alphabet of MSA 0: Protein 1: RNA 2: DNA |
No | 0 | --alphabet=1 |
|
--check_validation=[true/false] |
Validate the input MSA file based on alphabet or not | No | false | --check_validation=true |
|
--threshold=<value> |
Threshold value of considering two sequences similar (between 0 and 1) | No | 0.8 | --threshold=0.7 |
|
--norm=<value> |
Normalization option for NEFF 0: Normalize by the square root of sequence length 1: Normalize by the sequence length 2: No Normalization |
No | 0 | --norm=2 |
|
--omit_query_gaps=[true/false] |
Omit gap positions of query sequence from entire sequences for NEFF computation | No | true | --omit_query_gaps=true |
|
--is_symmetric =true/false] |
Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) | No | true | --is_symmetric=false |
|
--non_standard_option=<value> |
Options for handling non-standard letters of the specified alphabet 0: Treat them the same as standard letters 1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version) 2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch |
No | 0 | --non_standard_option=1 |
|
--depth=<value> |
Depth of MSA to be considered in computation (starting from the first sequence) | No | inf (consider all sequences) | --depth=10 (if given value is greater than original depth, it considers the original depth) |
|
--gap_cutoff=<value> |
Threshold for considering a position as gappy and removing that (between 0 and 1) | No | 1 (no gappy position) | --gap_cutoff=0.7 |
|
--pos_start=<value> |
Start position of each sequence to be considered in NEFF (inclusive) | No | 1 (the first position) | --pos_start=10 |
|
--pos_end=<value> |
Last position of each sequence to be considered in NEFF (inclusive) | No | inf (consider all sequence) | --pos_end=50 (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA) |
|
--only_weights=[true/false] |
Return only sequence weights, rather than the final NEFF | No | false | --only_weights=true |
|
--multimer_MSA=[true/false] |
Compute NEFF for MSA of a multimer | No | false | --multimer_MSA=true |
|
--stoichiom=<value> |
Stochiometry of the multimer | when _multimerMSA=true | --stoichiom=A2B1 |
||
--chain_length=<list of values> |
Length of the chains in a heteromer | when _multimerMSA=true and multimer is a heteromer | 0 | --chain_length=17 45 |
|
--residue_neff=[true/false] |
Compute per-residue (column-wise) NEFF | No | false | --residue_neff=true |
For more details about features, please refer to the documentation.
neff --file=./MSAs/example.a2m --threshold=0.6 --norm=2 --is_symmetric=false --check_validation=true
As output, it will print the final MSA length, depth and Neff to the console, based on the given options.
For more examples on using NEFFy for NEFF calculations with various options and features, please refer to the documentation usage guide.
The MSA file conversion allows you to convert MSA files between different supported formats.
All you need is to use the converter
program and specify the input and output files with their formats, and the tool will perform the conversion accordingly.
The code accepts the following command-line flags: | Flag | Description | Required | Default Value | Example |
---|---|---|---|---|---|
--in_file=<filename> |
Specifies the inputf MSA file to be converted. Replace <filename> with the path and name of the input file |
Yes | N/A | --in_file=input.fasta |
|
--out_file=<filename> |
Specifies the output file where the converted MSA will be saved. Replace <filename> with the desired path and name of the output file |
Yes | N/A | --out_file=output.a2m |
|
--alphabet=<value> |
Alphabet of MSA 0: Protein 1: RNA 2: DNA |
No | 0 | --alphabet=1 |
|
--check_validation=[true/false] |
Validate the input MSA file based on alphabet or not | No | true | --check_validation=true |
Please note that the conversion is performed based on the specified input and output file extensions.
For more details about features, please refer to the documentation.
Suppose you have an MSA file named "input.fasta" and you want to convert to the A2M format and save it as "output.a2m".
converter --in_file=./MSAs/example.a2m --out_file=./MSAs/example.sto
For more examples on using NEFFy for MSA conversion, please refer to the documentation usage guide.
Neffy also provides a python library as an interface of the executable files.
To install the library from the source:
git clone https://github.com/Maryam-Haghani/Neffy.git
cd Neffy
setuptools
and wheel
installed:
pip install setuptools wheel
python setup.py sdist bdist_wheel
Install the package from the root directory of the project:
pip install .
Alternatively, you can install the package directly from the built wheel file (in the dist
directory):
pip install dist/neffy-0.1-py3-none-any.whl
To install the package from PyPI:
pip install neffy
cd example
python compute_neff.py
You can find more examples of using the Python library's various methods for NEFF calculations in the examples directory. For method parameters and detailed explanations, please refer to the documentation usage guide.
cd example
python convert_msa.py
Additional examples of using NEFFy for MSA conversion can be found in the example directory. For further detailed explanations, please refer to the documentation usage guide.
In the documentation, you will find a brief explanation of each format, along with an illustrative alignment example for each one.
If any errors occur during the execution of the MSA Processor, an error message will be displayed, describing the issue encountered.
Please refer to the error message for troubleshooting or make necessary corrections to the input.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
For further assistance, please see the documentation.