In #26 I wrongly said that the algorithm is just too surprised by successes=1 out of total=10 long before the halflife. That's not true—the algorithm failing there is due to catastrophic cancellation or somesuch numerical issue related to insufficient precision of float.
Here's one alternative way to update, using the raw distribution:
from scipy.special import betaln, logsumexp
from scipy.integrate import trapz
from scipy.stats import beta as betarv
import numpy as np
def binomln(n, k):
"Log of scipy.special.binom calculated entirely in the log domain"
return -betaln(1 + n - k, 1 + k) - np.log(n + 1)
def mkPosterior(prior, successes, total, tnow, tback):
(alpha, beta, t) = prior
dt = tnow / t
et = tback / tnow
binomlns = [binomln(total - successes, i) for i in range(total - successes + 1)]
signs = [(-1)**i for i in range(total - successes + 1)]
logDenominator = logsumexp([
binomlns[i] + betaln(beta, alpha + dt * (successes + i)) for i in range(total - successes + 1)
],
b=signs) + np.log(dt * et)
logPdf = lambda logp: logsumexp([
binomlns[i] + logp * ((alpha + dt * (successes + i)) / (dt * et) - 1) +
(beta - 1) * np.log(1 - np.exp(logp / (et * dt))) for i in range(total - successes + 1)
],
b=signs) - logDenominator
return logPdf
def _meanVarToBeta(mean, var):
"""Fit a Beta distribution to a mean and variance."""
# [betaFit] https://en.wikipedia.org/w/index.php?title=Beta_distribution&oldid=774237683#Two_unknown_parameters
tmp = mean * (1 - mean) / var - 1
alpha = mean * tmp
beta = (1 - mean) * tmp
return alpha, beta
pre3 = (3.3, 4.4, 1.)
tback3 = 1.
tnow = 1 / 50.
f = np.vectorize(mkPosterior(pre3, 1, 10, tnow, tback3))
ps = np.logspace(-5, 0, 5000)
pdf = np.exp(f(np.log(ps)))
# mom1 = trapz((ps * pdf)[np.isfinite(pdf)], ps[np.isfinite(pdf)])
# mom2 = trapz((ps**2 * pdf)[np.isfinite(pdf)], ps[np.isfinite(pdf)])
# var = mom2 - mom1**2
# model3 = list(_meanVarToBeta(mom1, var)) + [tback3]
# [4.988734353788027, 9.6709418534125, 1.0]
from scipy.optimize import minimize
res3 = minimize(
lambda x: np.sum(
np.abs(betarv.pdf(ps[np.isfinite(pdf)], x[0], x[1]) - pdf[np.isfinite(pdf)])**2), [1.5, 20])
# res3['x']: [ 2.01179687, 55.74700207]
import matplotlib.pylab as plt
plt.ion()
plt.figure()
plt.semilogx(ps, pdf, ps, betarv.pdf(ps, res3['x'][0], res3['x'][1]))
This is obviously very stupid but basically we know the true posterior, we can fit a Beta distribution to its shape if we evaluate the PDF at a tback where the bulk of the density's mass is at tiny probabilities where numerical issues aren't a problem. This is the result of plotting the distribution and the Beta fit:
This yielded the model [2.012, 55.747, 1.0] which is reasonable for this extreme case.
I'm not yet sure how to robustly use this. But I'm happy to see that it's just a numerical issue causing updateRecall to find invalid mean/variances, and that there are other ways to compute a Beta fit that deserve investigation.
In #26 I wrongly said that the algorithm is just too surprised by
successes=1
out oftotal=10
long before the halflife. That's not true—the algorithm failing there is due to catastrophic cancellation or somesuch numerical issue related to insufficient precision of float.Here's one alternative way to update, using the raw distribution:
This is obviously very stupid but basically we know the true posterior, we can fit a Beta distribution to its shape if we evaluate the PDF at a
tback
where the bulk of the density's mass is at tiny probabilities where numerical issues aren't a problem. This is the result of plotting the distribution and the Beta fit:This yielded the model
[2.012, 55.747, 1.0]
which is reasonable for this extreme case.I'm not yet sure how to robustly use this. But I'm happy to see that it's just a numerical issue causing
updateRecall
to find invalid mean/variances, and that there are other ways to compute a Beta fit that deserve investigation.