PoonLab / Kaphi

Kernel-embedded ABC-SMC for phylodynamic inference
GNU Affero General Public License v3.0
4 stars 2 forks source link

Kernel normalization problem #22

Closed ArtPoon closed 7 years ago

ArtPoon commented 7 years ago

This code produces a ws$dists matrix with negative values:

require(Kaphi)
require(yaml)
    config <- load.config('tests/fixtures/coalescent.yaml')
    config <- set.model(config, const.coalescent)
    nparticle <- config$nparticle
    theta <- c(Ne.tau=100)
    set.seed(100)
    obs.tree <- const.coalescent(theta, nsim=1, n.tips=20)[[1]]
    ws <- init.workspace(obs.tree, config)
ws
config <- ws$config
ws <- initialize.smc(ws)
ws$epsilon
ws$accept
ws$alive
next.epsilon(ws)
epsilon.obj.func(ws, 1)
epsilon.obj.func(ws, 0.1)
ws$dists

Output:

            [,1]         [,2]        [,3]        [,4]        [,5]         [,6]
[1,]  0.09855444 -0.003547483  0.01952083  0.26282476 -0.08976043 -0.358024602
[2,]  0.18498320 -0.019951910 -0.05274277 -0.20163216 -0.03755821 -0.313424967
[3,] -0.08907720 -0.003324091  0.10716642  0.03362616 -0.07638166  0.019548475
[4,] -0.01809160 -0.111153807 -0.12428295 -0.03709038  0.36552203 -0.009851596
[5,]  0.09247920 -0.074805804 -0.04914617 -0.39641339 -0.03768756 -0.066284821
[6,]  0.02441562  0.020498423  0.29638671 -0.04224123 -0.33942571  0.041807836
            [,7]        [,8]        [,9]       [,10]
[1,] -0.11590842 -0.08407187 -0.05446778  0.03628705
[2,] -0.01963804 -0.03468434 -1.32059338  0.11950563
[3,] -0.03013048 -0.26036020 -0.02060306 -0.06903098
[4,] -0.02251815  0.01507980 -0.25298418 -0.35906854
[5,] -0.46878691 -0.00202518 -0.07799568 -0.02806073
[6,] -0.72675567 -0.06024982 -0.05166268 -1.10345042

Something is very wrong here!

ArtPoon commented 7 years ago

Huh. It's actually working fine, but the normalization breaks when the decay factor is so high that a tree's self kernel score is below 1.

ArtPoon commented 7 years ago

There's another problem where the normalization denominator is smaller than the numerator. This is causing the unit test test.perturb.particles at initialize.smc() at commit 629d566fa7746be19086cd578edba0ecbd15c8b1

> ws <- initialize.smc(ws)
ERROR: distance() value outside range [0,1].
 k:  17.82165 
 t1$kernel:  19.83905 
 t2$kernel:  15.79234 
> sqrt(19.83905*15.79234)
[1] 17.70042

This happens at the distance calculations (ws$dists).

ArtPoon commented 7 years ago

It looks like normalization doesn't work when tree branch lengths are not normalized.

> config$norm.mode <- 'mean'
> utk(ws$obs.tree, sim.tree, config)
[1] 9.437352
> utk(sim.tree, sim.tree, config)
[1] 14.13488
> utk(ws$obs.tree, ws$obs.tree, config)
[1] 7.371364
> sqrt(14.13488*7.371364)
[1] 10.20751

Note utk is a wrapper function to the unlabelled tree kernel where we can pass config.

ArtPoon commented 7 years ago

Yeah, setting norm.mode to MEAN in coalescent.yaml clears the error. I suppose this particular normalization problem (when not rescaling branch lengths) because tree 1 can never exactly match tree 2, even if the topology is exactly the same and branch length distribution is congruent...

ArtPoon commented 7 years ago

Guh, that was bollocks. Still getting errors with norm.mode set to MEAN. Something else is going on.

ArtPoon commented 7 years ago

Need to write more unit tests to suss this out :-/

ArtPoon commented 7 years ago

Try to find a minimal case that reproduces this normalization error by simulating coalescent trees with three tips

ArtPoon commented 7 years ago

Okay, found a minimum case and determined that the problem for this particular issue is that utk is returning a different result than tree.kernel, even though one is just a wrapper for the other. The problem is the config$sst.control value, which is displayed as 1 but doesn't work the same:

> tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance,
+         rescale.mode=config$norm.mode, rho=1)
[1] 1.073184
> utk(tr1, tr2, config)
[1] 0.1192426
> tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance,
+         rescale.mode=config$norm.mode, rho=config$sst.control)
[1] 0.1192426
> config$sst.control
[1] 1
ArtPoon commented 7 years ago

Very strange! :

>  tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance, rescale.mode=config$norm.mode, rho=as.double(config$sst.control))
[1] 1.073184

Probably something to do with how the SEXP object is being handled in kernel.c.

ArtPoon commented 7 years ago

Working through the unit test for this issue, I discovered that how the kernel is being compute in treestats.c is not the same as phyloK2.py, even though the latter matches my manual calculation.