beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees
http://beast.community
GNU Lesser General Public License v2.1
188 stars 71 forks source link

BirthDeathEpidemiologyModel bug #577

Closed msuchard closed 9 years ago

msuchard commented 9 years ago

Originally reported on Google Code with ID 569

need to review the equation for res = ... in line 194 in BirthDeathSerialSamplingModel.
It is getting INF by given the xml

==========================

I specify the epidemiological model in Beauty (for an alignment we used in our R0 
paper), but then the speciation likelihood is always -Inf.

Tanja

          BEAST v1.7.0 Prerelease r4563, 2002-2012
       Bayesian Evolutionary Analysis Sampling Trees
                 Designed and developed by
   Alexei J. Drummond, Andrew Rambaut and Marc A. Suchard

               Department of Computer Science
                   University of Auckland
                  alexei@cs.auckland.ac.nz<mailto:alexei@cs.auckland.ac.nz>

             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.rambaut@ed.ac.uk<mailto:a.rambaut@ed.ac.uk>

              David Geffen School of Medicine
           University of California, Los Angeles
                     msuchard@ucla.edu<mailto:msuchard@ucla.edu>

                Downloads, Help & Resources:
                 http://beast.bio.ed.ac.uk

Source code distributed under the GNU Lesser General Public License:
            http://code.google.com/p/beast-mcmc

                     BEAST developers:
Alex Alekseyenko, Trevor Bedford, Erik Bloomquist, Joseph Heled,
Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li,
Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel,
          Oliver Pybus, Chieh-Hsi Wu, Walter Xie

                         Thanks to:
    Roald Forsberg, Beth Shapiro and Korbinian Strimmer

Random number seed: 1324563295127

Loading additional development parsers from development_parsers.properties, which is
additional set of parsers only available for development version ...
Parsing XML file: test.xml
  File encoding: MacRoman
Looking for plugins in /Users/tanja/Beast/beast-mcmc-read-only/plugins
Read alignment: alignment
  Sequences = 34
      Sites = 1551
   Datatype = nucleotide
Site patterns 'patterns' created from positions 1-1551 of alignment 'alignment'
  pattern count = 292
Using epidemiological parameterization of Stadler et al (2011) : Estimating the basic
reproductive number from viral sequence data, Mol.Biol.Evol., doi: 10.1093/molbev/msr217,
2011
Creating the tree model, 'treeModel'
  initial tree topology = ((((((((((((((((((((((((r10683886dt732522,r9669342dt732526),r9669638dt732529),(r10652463dt732624,r12284256dt732636)),(r10153761dt732594,r15280801dt732886)),r5506089dt732358),(((r4228404dt732107,r5073850dt732095),r5258831dt732053),r5062763dt732047)),r3021431dt731719),r1470425dt731446),(r12258435dt731422,r1464889dt731516)),r675412dt731176),r661689dt730677),r494013dt730481),((r14896566dt730573,r662413dt730565),r503711dt730536)),r662477dt730256),r20490234dt730124),r1628227dt730012),r21554982dt729885),r503423dt729884),r21553623dt729719),(r19793779dt729097,r5252289dt729163)),r20490834dt728911),r19677881dt728772),r17616815dt728685),r15912268dt728491)
  tree height = 4577.244801796485
Using discretized relaxed clock model.
  over sampling = 1
  parametric model = logNormalDistributionModel
   rate categories = 1
Creating state frequencies model 'frequencies': Initial frequencies = {0.25, 0.25,
0.25, 0.25}
Creating HKY substitution model. Initial kappa = 2.0
Creating site model.
  4 category discrete gamma with initial shape = 0.5
TreeLikelihood(treeModel) using Java nucleotide likelihood core
  Ignoring ambiguities in tree likelihood.
  With 292 unique site patterns.
Branch rate model used: discretizedBranchRates
Failed to load native NucleotideLikelihoodCore in : /Users/tanja/Beast/beast-mcmc-read-only
Using Java nucleotide likelihood core because java.lang.UnsatisfiedLinkError: no NucleotideLikelihoodCore
in java.library.path
Creating swap operator for parameter branchRates.categories (weight=10.0)
Creating the MCMC chain:
  chainLength=10000000
  autoOptimize=true
  autoOptimize delayed for 100000 steps
Error running file: test.xml
The initial model is invalid because state has a zero probability.

If the log likelihood of the tree is -Inf, his may be because the
initial, random tree is so large that it has an extremely bad
likelihood which is being rounded to zero.

Alternatively, it may be that the product of starting mutation rate
and tree height is extremely small or extremely large.

Finally, it may be that the initial state is incompatible with
one or more 'hard' constraints (on monophyly or bounds on parameter
values. This will result in Priors with zero probability.

The individual components of the posterior are as follows:
The initial posterior is zero:
  DistributionLikelihood=-1.8654,
  DistributionLikelihood=0.0,
  DistributionLikelihood=-0.3069,
  DistributionLikelihood=0.0986,
  DistributionLikelihood=-0.9189,
  DistributionLikelihood=-230.2585,
  DistributionLikelihood=-1.8654,
  DistributionLikelihood=-4.6052,
  DistributionLikelihood=0.0,
  SpeciationLikelihood(speciationLikelihood)=-Inf,
  TreeLikelihood(treeLikelihood)=-45287.8362

Reported by dong.w.xie on 2012-02-01 22:31:36


msuchard commented 9 years ago
In the screenshot t is way too large (48000) so exp(c1 * t) overflows. The issue is
that the initial t needs to be specified so that it is greater than the root time but
only a little bit. Perhaps the initial value needs to be set as rootHeight + delta?

Reported by rambaut on 2012-02-02 07:40:42

msuchard commented 9 years ago
This is very tricky part:

origin (t here) = avgInitialRootHeight * 1.1, which is not big increase during auto-rescaling.

But the problem is that avgInitialRootHeight = options.maximumTipHeight * 10.0; from
BEAUti estimation in "case TIP_CALIBRATED" of ClockModelOptions.calculateInitialRootHeightAndRate(partitions).

We spent a lot of time on this, but never getting any good estimation of init root
height for all cases.

Would you have any simple solution?

Reported by dong.w.xie on 2012-02-03 04:05:33

msuchard commented 9 years ago
The problem is the entire scale of the numbers. If the c1 * t > 700 then it will over
flow. The solution may be to scale t down by using a larger time unit? 

Reported by rambaut on 2012-02-03 07:07:38

msuchard commented 9 years ago
just checked the critical equation:
res = 2.0 * (1.0 - c2 * c2) + Math.exp(-c1 * t) * (1.0 - c2) * (1.0 - c2) + Math.exp(c1
* t) * (1.0 + c2) * (1.0 + c2);

So, it is correct. However we could rewrite it:
res = Math.exp(c1 * t)  * ( Math.exp(-c1 * t) * (1.0 - c2) +  (1.0 + c2) ) * ( Math.exp(-c1
* t) * (1.0 - c2) +  (1.0 + c2) );
meaning if we operate directly in logspace:

res = c1*t+2* Log ( Math.exp(-c1 * t) * (1.0 - c2) + * (1.0 + c2) );

so we avoid any Math.exp(c1*t) !

Reported by tanja819 on 2012-02-03 15:40:35

msuchard commented 9 years ago
sorry last row is a typo, correct is:
res = c1*t+2* Log ( Math.exp(-c1 * t) * (1.0 - c2) + (1.0 + c2) );

Reported by tanja819 on 2012-02-03 15:41:40

msuchard commented 9 years ago

Reported by dong.w.xie on 2012-02-06 22:20:09