althonos / pymuscle5

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.
GNU General Public License v3.0
18 stars 2 forks source link
bioinformatics cython-library genomics multiple-sequence-alignment muscle python-bindings python-library sequence-alignment

pyMUSCLE5 Stars

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.

Actions

🗺️ Overview

MUSCLE is widely-used software for making multiple alignments of biological sequences. Version 5 of MUSCLE achieves highest scores on several benchmark tests and scales to thousands of sequences on a commodity desktop computer.

pyMUSCLE5 is a Python module that provides bindings to MUSCLE v5 using Cython. It directly interacts with the MUSCLE internals, which has the following advantages:

This library is in a very experimental stage at the moment, and consistency of the results across versions or platforms is not guaranteed yet.

🔧 Installing

At the moment pyMUSCLE5 is not available on PyPI. You can however install it directly from GitHub with:

$ pip install git+https://github.com/althonos/pymuscle5

💡 Example

Let's load some sequences sequence from a FASTA file, use an Aligner to align proteins together, and print the alignment in two-line FASTA format.

🔬 Biopython

import os

import Bio.SeqIO
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(Bio.SeqIO.parse(path, "fasta"))

sequences = [
    pymuscle5.Sequence(record.id.encode(), bytes(record.seq))
    for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
    print(f">{seq.name.decode()}")
    print(seq.sequence.decode())

🧪 Scikit-bio

import os

import skbio.io
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(skbio.io.read(path, "fasta"))

sequences = [
    pymuscle5.Sequence(record.metadata["id"].encode(), record.values.view('B'))
    for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
    print(f">{seq.name.decode()}")
    print(seq.sequence.decode())

We need to use the view method to get the sequence viewable by Cython as an array of unsigned char.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0. The MUSCLE code was written by Robert Edgar and is distributed under the terms of the GPLv3 as well. See vendor/muscle/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original MUSCLE authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.