bredelings / BAli-Phy

Bayesian co-estimation of phylogenies and multiple alignments via MCMC
http://www.bali-phy.org/
GNU General Public License v2.0
46 stars 17 forks source link

use eigen instead of ublas/tnt/jama #1

Closed argriffing closed 10 years ago

argriffing commented 10 years ago

I considered doing this, so I looked at the file https://github.com/bredelings/BAli-Phy/blob/master/src/math/exponential.C but I got scared away because its scope includes only time-reversible rate (or transition) matrices, and because it has a weird gamma thing that I don't understand and probably don't want to use, and because the testing is weird so I won't know if I've screwed up the project. The test is controlled by an #ifdef?

bredelings commented 10 years ago

On 02/20/2014 01:56 PM, argriffing wrote:

I considered doing this, so I looked at the file https://github.com/bredelings/BAli-Phy/blob/master/src/math/exponential.C but I got scared away because its scope includes only time-reversible rate (or transition) matrices, and because it has a weird gamma thing that I don't understand and probably don't want to use, and because the testing is weird so I won't know if I've screwed up the project. The test is controlled by an #ifdef?

— Reply to this email directly or view it on GitHub https://github.com/bredelings/BAli-Phy/issues/1.

To be almost serious:

(a) No worries about screwing up the project -- that is what git is for! Specifically, first you screw up the project. Second, I refuse to pull (hopefully). Voila! Instant fun! More seriously, you can find 99% of problems with math by simply running 10 iterations on a small file, and checking that the sampled scalar variables (C1.p) don't change as a result of your patch.

(b) I'm shocked that you didn't start with eigenvalue.C! SHOCKED.

Note that only eigenvalue.H and eigenvalue.C actually use TNT or JAMA. Therefore, if you want to eliminate TNT, the boring, responsible, (and correct) thing to do is to reimplement the internals of class EigenValues, without changing its interface. This avoids combining 'orthogonal' problems, which is always a mistake. After TNT/JAMA are removed, one could then engage in all sorts of hackery for how eigenvalues & exponentials are computed. A kind of obvious first step would be to discover if computer O_D_O^t can actually be written as O_D_transpose(O), instead of being hand-unrolled for speed. That would probably require running on an amino-acid or codon data set with a fixed alignment to see if one could detect speed increases.

--- OK, even more seriously ----

  1. Weird gamma thing

Nothing uses it. For your own amusement, consider calculating the expectation

E exp(q*T)

where T is a gamma(a,b) distributed random variable.

This could be removed.

  1. The testing is weird

The testing is non-existant. In the distant past, it allowed one to compile exponential.C as its out executable.

This could be removed.

  1. ... its scope includes time-reversible rate (or transition) matrices ...

Is this scary, or too small to be interesting? I would assume the latter? If so, we could talk about ways of speeding this up.

bredelings commented 10 years ago
  1. ... its scope includes time-reversible rate (or transition) matrices ...

Is this scary, or too small to be interesting? I would assume the latter? If so, we could talk about ways of speeding this up. s/speeding this up/expanding its scope/

argriffing commented 10 years ago

you can find 99% of problems with math by simply running 10 iterations on a small file, and checking that the sampled scalar variables (C1.p) don't change as a result of your patch.

Is there a thing that will do this automatically? For reference, the testing for other projects that I've contributed to are like

networkx:

git clone https://github.com/networkx/networkx.git
cd networkx
nosetests networkx

cvxpy:

git clone https://github.com/cvxgrp/cvxpy.git
cd cvxpy
nosetests cvxpy

numpy:

git clone https://github.com/numpy/numpy.git
cd numpy
python runtests.py

scipy:

git clone https://github.com/scipy/scipy.git
cd scipy
python runtests.py

algopy:

git clone https://github.com/b45ch1/algopy.git
cd algopy
python run_tests.py
bredelings commented 10 years ago

On 02/20/2014 04:16 PM, argriffing wrote:

you can find 99% of problems with math by simply running 10 iterations
on a small file, and checking that the sampled scalar variables (C1.p)
don't change as a result of your patch.

Is there a thing that will do this automatically? For reference, the testing for other projects that I've contributed to are like

Not exactly... there's not a true right answer to "what should the first 10 iterations look like".

However, if you know that your patch should produce no functional change (such as implementing SVD by a different algorithm) then you know that the output (whatever it is) should not change when the patch is applied.

This ends up looking like (1) make install (2) bali-phy 5d.fasta --seed=0 --iter=10 <note that it writes output to (say) 5d-301/ > (3) diff 5d-{300,301}/C1.p

-BenRI
bredelings commented 10 years ago

Eigen is now used instead of TNT/ublas/JAMA. However, I created a simple matrix class to replace ublas. Mostly the code uses this, instead of Eigen because it was the easiest way to remove ublas. Eigen is used, but only for eigensystems. We can incrementally switch to eigen. Perhaps the next place would be in the matrix exponential code.

However, I'm not sure that all 2d arrays should necessarily use Eigen. They aren't all doing matrix math, for example.