Open multimeric opened 3 years ago
@multimeric I have also noticed the missing benchmark.py
and our team's current needs is indeed a vouch for your closing statement. We would otherwise have to build our own CLI that would recycle the utilities you mentioned.
Fruther to this, it would be nice to have a DNA-only variant of pyMSA's scoring stack. e.g. for Sum-of-pairs, where a DNA substitution matrix could be passed as input.
We're trying to score various MSA approaches to decide on the right approach for our pipeline at the moment. So these features are really a gap in the ecosystem that is in need of closing.
@multimeric , I have now gotten a local repo which utilises read_fasta_file_as_list_of_pairs
, by importing this into thescore_alignments.py
script to accept someFile.fasta
as input rather than a hard-coded python list. It's still not a full CLI but Let me know if this would be something you want.
@multimeric , I have now gotten the score_alignments.py to work for purely DNA sequences so sum-of-pairs score can be calculated by providing a DNA substitution matrix as input file argument. e.g. DNA85.txt:
# Match score: 1.766, mismatch score: -2.322 bits
# Expected score: -1.30, entropy: 1.15 bits
A T G C
A 1.77 -2.32 -2.32 -2.32
T -2.32 1.77 -2.32 -2.32
G -2.32 -2.32 1.77 -2.32
C -2.32 -2.32 -2.32 1.77
These matrices can be created by: https://bioinformaticshome.com/online_software/create_DNA_matrix/createDNAmatrix.html
Yeah I think I made a simple script by combining run_all_scores
with read_fasta_file_as_list_of_pairs
: https://github.com/benhid/pyMSA/blob/570d902bbc214a30f18b93adf735c46836611bf5/examples/runner.py#L5-L42
Hi, thanks for this wonderful library!
I'm just wondering that since we have utilities like
read_fasta_file_as_list_of_pairs
, and alsorun_all_scores
which runs a comprehensive evaluation, we could write a CLI that calls these these on an input fasta alignment (and initially not support other alignment formats for simplicity), and maybe make the scores configurable via flags. It seems that there was abenchmark.py
that did this (it's alluded to in the PDF), but it must have been deleted.This would offer a very useful and easy method of evaluating MSAs, which as far as I can tell is a gap in the ecosystem at the moment.