Closed ArtPoon closed 7 years ago
Huh. It's actually working fine, but the normalization breaks when the decay factor is so high that a tree's self kernel score is below 1.
There's another problem where the normalization denominator is smaller than the numerator. This is causing the unit test test.perturb.particles
at initialize.smc()
at commit 629d566fa7746be19086cd578edba0ecbd15c8b1
> ws <- initialize.smc(ws)
ERROR: distance() value outside range [0,1].
k: 17.82165
t1$kernel: 19.83905
t2$kernel: 15.79234
> sqrt(19.83905*15.79234)
[1] 17.70042
This happens at the distance calculations (ws$dists
).
It looks like normalization doesn't work when tree branch lengths are not normalized.
> config$norm.mode <- 'mean'
> utk(ws$obs.tree, sim.tree, config)
[1] 9.437352
> utk(sim.tree, sim.tree, config)
[1] 14.13488
> utk(ws$obs.tree, ws$obs.tree, config)
[1] 7.371364
> sqrt(14.13488*7.371364)
[1] 10.20751
Note utk
is a wrapper function to the unlabelled tree kernel where we can pass config
.
Yeah, setting norm.mode
to MEAN
in coalescent.yaml
clears the error. I suppose this particular normalization problem (when not rescaling branch lengths) because tree 1 can never exactly match tree 2, even if the topology is exactly the same and branch length distribution is congruent...
Guh, that was bollocks. Still getting errors with norm.mode
set to MEAN
. Something else is going on.
Need to write more unit tests to suss this out :-/
Try to find a minimal case that reproduces this normalization error by simulating coalescent trees with three tips
Okay, found a minimum case and determined that the problem for this particular issue is that utk
is returning a different result than tree.kernel
, even though one is just a wrapper for the other. The problem is the config$sst.control
value, which is displayed as 1
but doesn't work the same:
> tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance,
+ rescale.mode=config$norm.mode, rho=1)
[1] 1.073184
> utk(tr1, tr2, config)
[1] 0.1192426
> tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance,
+ rescale.mode=config$norm.mode, rho=config$sst.control)
[1] 0.1192426
> config$sst.control
[1] 1
Very strange! :
> tree.kernel(tr1, tr2, lambda=config$decay.factor, sigma=config$rbf.variance, rescale.mode=config$norm.mode, rho=as.double(config$sst.control))
[1] 1.073184
Probably something to do with how the SEXP object is being handled in kernel.c
.
Working through the unit test for this issue, I discovered that how the kernel is being compute in treestats.c
is not the same as phyloK2.py
, even though the latter matches my manual calculation.
This code produces a
ws$dists
matrix with negative values:Output:
Something is very wrong here!