CompEvol / beast2

Bayesian Evolutionary Analysis by Sampling Trees
www.beast2.org
GNU Lesser General Public License v2.1
238 stars 83 forks source link

DefaultEigenSystem throws ArrayIndexOutOfBoundsException for complex eigenvalues #338

Closed Anaphory closed 9 years ago

Anaphory commented 9 years ago

For small complex eigenvalues of the Hessenberg matrix, beast.evolution.substitutionmodel.DefaultEigenSystem.hqr2 tries to access index -1 of double[][] h.

When en==1, the loop starting in line 362 is run. In that case, na=0 (365), and for a complex eigenvalue, q>0. This means the if block starting at line 418 is executed, which in line 420 contains a reference to h[na - 1] and h[...][na - 1], but since na - 1 == en - 1 - 1 == -1, that operation fails and throws

java.lang.ArrayIndexOutOfBoundsException: -1
Usage: BeastMCMC [options] <Beast.xml>
where <Beast.xml> the name of a file specifying a Beast run
and the following options are allowed:
-resume : read state that was stored at the end of the last run from file and append log file
-overwrite : overwrite existing log files (if any). By default, existing files will not be overwritten.
-seed [<int>|random] : sets random number seed (default 127), or picks a random seed
-threads <int> : sets number of threads (default 1)
-prefix <name> : use name as prefix for all log files
-beastlib <path> : Colon separated list of directories. All jar files in the path are loaded. (default 'beastlib')
    at beast.evolution.substitutionmodel.DefaultEigenSystem.hqr2(DefaultEigenSystem.java:420)
    at beast.evolution.substitutionmodel.DefaultEigenSystem.decomposeMatrix(DefaultEigenSystem.java:38)
    at beast.evolution.substitutionmodel.GeneralSubstitutionModel.getTransitionProbabilities(GeneralSubstitutionModel.java:116)
    at beast.evolution.likelihood.TreeLikelihood.traverse(TreeLikelihood.java:399)
    at beast.evolution.likelihood.TreeLikelihood.traverse(TreeLikelihood.java:411)
    at beast.evolution.likelihood.TreeLikelihood.calculateLogP(TreeLikelihood.java:340)
    at beast.core.util.CompoundDistribution.calculateLogP(CompoundDistribution.java:107)
    at beast.core.util.CompoundDistribution.calculateLogP(CompoundDistribution.java:107)
    at beast.core.MCMC.doLoop(MCMC.java:440)
    at beast.core.MCMC.run(MCMC.java:366)
    at beast.app.BeastMCMC.run(BeastMCMC.java:305)
    at beast.app.BeastMCMC.main(BeastMCMC.java:519)
tgvaughan commented 9 years ago

Are those methods designed to handle non-symmetric matrices? Substitution models (at least the ones we use) tend to be reversible.

Anaphory commented 9 years ago

GeneralSubstitutionModel uses a not necessarily symmetric transition matrix, with hardly any other condition on it. I am not fluent enough in the method to see whether it should work and just runs into an off-by-one error or similar, or whether something more fundamental is the problem.

alexeid commented 9 years ago

I am pretty sure that the DefaultEigenSystem can only handle symmetric rate matrices with real eigen values. So the implementation of GeneralSubstitutionModel is a little bit deceptive, because you can input a rate matrix that will fail if you use the DefaultEigenSystem. If you give it a non-symmetric set of rates then you will also have to provide and EigenSystem that can handle complex eigenvalues. This is a known hard problem:

Moler, C., & Van Loan, C. (1978). Nineteen dubious ways to compute the exponential of a matrix. SIAM review, 20(4), 801-836.

Moler, C., & Van Loan, C. (2003). Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM review, 45(1), 3-49.

On 15/05/2015, at 1:59 am, Gereon Kaiping notifications@github.com wrote:

GeneralSubstitutionModel uses a not necessarily symmetric transition matrix, with hardly any other condition on it.

— Reply to this email directly or view it on GitHub https://github.com/CompEvol/beast2/issues/338#issuecomment-102045491.

Anaphory commented 9 years ago

Ow. That is quite a setback, given that both the article I want to replicate and the specific application I have in mind are explicitly concerned with non‐time‐reversible models of character evolution.

Also, in that case, there should probably be a TimeReversibleSubstitutionModel class, and a big warning sign that DefaultEigenSystem does not support non-symmetric matrices. Maybe I'll manage to write one tomorrow. (And in the process learn something about why my CorrelatedSubstitutionModel does not work…)

I am not well-versed in the literature on those algorithms. This one has been in the class since the initial commit, how can we find out for what matrices it is supposed to work?

tgvaughan commented 9 years ago

You might want to have a look at the discrete trait substitution model in BEAST_CLASSIC which allows for non-symmetric transition matrices.

alexeid commented 9 years ago

I think you will want to port the RobustEigenDecomposition class from BEASTLabs into the core and refactor it to extend beast.evolution.substitutionmodel.EigenDecomposition:

https://github.com/BEAST2-Dev/BEASTLabs/blob/master/src/beast/math/matrixalgebra/RobustEigenDecomposition.java https://github.com/BEAST2-Dev/BEASTLabs/blob/master/src/beast/math/matrixalgebra/RobustEigenDecomposition.java

On 15/05/2015, at 12:21 pm, Gereon Kaiping notifications@github.com wrote:

Ow. That is quite a setback, given that both the article I want to replicate and the specific application I have in mind are explicitly concerned with non‐time‐reversible models of character evolution.

Also, in that case, there should probably be a TimeReversibleSubstitutionModel class, and a big warning sign that DefaultEigenSystem does not support non-symmetric matrices. Maybe I'll manage to write one tomorrow. (And in the process learn something about why my CorrelatedSubstitutionModel does not work…)

I am not well-versed in the literature on those algorithms. This one has been in the class since the initial commit, how can we find out for what matrices it is supposed to work?

— Reply to this email directly or view it on GitHub https://github.com/CompEvol/beast2/issues/338#issuecomment-102207835.

Anaphory commented 9 years ago

I'll have a look.

Also, the code in DefaultEigenSystem looks very much like it should support complex eigenvalues in some respect – judging from variable naming, not from understanding what it actually does. And the error actually occurs some time during the run, not very early (many rate matrices with lognormally distributed off-diagonal entries should probably have complex eigenvalues), reinforcing my suspicion that this might be a minor error, only occurring when the complex eigenvalue has small absolute value or something similar.

alexeid commented 9 years ago

DefaultEigenSystem has a code lineage that goes back to early versions of PAML implemented by Ziheng Yang. It works for Positive Definite Matrices only. It doesn’t handle complex diagonalizations. I agree that it needs much better documentation :)

On 15/05/2015, at 12:43 pm, Gereon Kaiping notifications@github.com wrote:

I'll have a look.

Also, the code in DefaultEigenSystem looks very much like it should support complex eigenvalues in some respect – judging from variable naming, not from understanding what it actually does. And the error actually occurs some time during the run, not very early (many rate matrices with lognormally distributed off-diagonal entries should probably have complex eigenvalues), reinforcing my suspicion that this might be a minor error, only occurring when the complex eigenvalue has small absolute value or something similar.

— Reply to this email directly or view it on GitHub https://github.com/CompEvol/beast2/issues/338#issuecomment-102210596.

Anaphory commented 9 years ago

Having done some tests on DefaultEigenSystem.hqr2, it now looks to me like the error is the following:

When the method finds a pair of complex eigenvalues, they are written as (353ff.)

                wr[na - 1] = x + p;
                wr[en - 1] = x + p;
                wi[na - 1] = z;
                wi[en - 1] = -z;

with na = en - 1 (153) and en going down in steps of two, starting from en=hgh. For the resubstitution, en starts at n and goes down in steps of 1 (362). The first eigenvalue from a pair it will encounter is therefore the one with negative imaginary part, but the test for imaginary part (418) only runs its code for positive values, and assumes tacitly that the next-smaller one will be the other one of the pair. This does of course fail with ArrayIndexOutOfBoundsException when en-1 already points to the very last eigenvalue when the last pair of eigenvalues is complex.

Anaphory commented 9 years ago

For showing that this indeed works, using unit tests, it would be necessary to make hqr2() at least protected. Do you want tests for hqr2() at this cost? I could write automatic tests for this. For the moment I tested it by temporarily making hqr2() public and outputting the results it gave, comparing them with what my manual calculation did, specifically on the matrix

0 -1
1  0

which gave wr={0, 0}, wi={-1, 1}, and the matrix -1 0\\0 1 for h, as expected.

rbouckaert commented 9 years ago

To use the RobusEigenDecomposition, add the following attribute to your substitution model in the XML: eigensystem=“beast.evolution.substitutionmodel.RobustEigenSystem”. It is slower than the default Eigen system (that is why it is not the default), but also more robust, hence the name, and it handles asymmetric rate matrices.

rbouckaert commented 9 years ago

Tried to find the source of this code, but could not track it in PAML (it seems to be using Method 3 from "19 dubious ways to exponentiate a matrix" in version 4.8). I added a test in beast-classic https://code.google.com/p/beast-classic/source/browse/trunk/src/test/beast/evolution/substitutionmodel/RobustEigenSystemTest.java that succeeds when Gereon's fix is used, but fails otherwise. Also, the condition resembles the one in RobustEigenDecomposition so I think we should go with this fix.

alexeid commented 9 years ago

Should we point this out to Marc and Andrew? BEAST1 uses the same code...

On 18/05/2015, at 5:37 pm, Remco Bouckaert notifications@github.com wrote:

Tried to find the source of this code, but could not track it in PAML (it seems to be using Method 3 from "19 dubious ways to exponentiate a matrix" in version 4.8). I added a test in beast-classic https://code.google.com/p/beast-classic/source/browse/trunk/src/test/beast/evolution/substitutionmodel/RobustEigenSystemTest.java that succeeds when Gereon's fix is used, but fails otherwise. Also, the condition resembles the one in RobustEigenDecomposition so I think we should go with this fix.

— Reply to this email directly or view it on GitHub.