fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
309 stars 32 forks source link

Approximate closed-form rebalancing for faster updates and more meaningful half-lives #41

Open matomatical opened 3 years ago

matomatical commented 3 years ago

@fasiha, I dare say, I think you're going to like this!

Overview

I've been working on a pure-Python implementation of the ebisu algorithm and I was not satisfied with the current approach to determining a new half-life during updates (that being 'just use the old half-life, unless the Beta model gets too unbalanced, then 'rebalance' it using a coarse grid search over halflives').

I came up with a different approach which turns out, from my initial experiments, to be simpler, work faster, and actually lead to more accurate/meaningful half-life updates than the current rebalancing strategy (see also #31).

Key idea

They key idea is as follows:

Algorithm

The resulting update algorithm works as follows. If my understanding of the ebisu algorithm is correct, the only change from the current algorithm is in step 4.

  1. We have the prior over recall probability at time λ_old, distributed according to a Beta distribution with parameters α_old, β_old. I'll denote this prior P_recall@λ_old ~ Beta(α_old, β_old).
  2. Because the quiz takes place at time t, we shift this prior through time before the update, to prior P_recall@t ~ GB1(...).
  3. We perform the update, computing posterior P_recall@t ~ (that complex distribution with analytical moments).
  4. We want to find a suitable half-life for this posterior:
    • Move the posterior back to time λ_old, giving posterior P_recall@λ_old ~ (another complex distribution with analytical moments).
    • Compute the first moment of this posterior at λ_old, giving us E[postr P_recall@λ_old].
    • The equation mentioned above approximately governs the approximately exponential decay of this posterior's mean. In particular, E[P_recall@λ_old] =approx. 2 ^ {-λ_old / λ_new} where λ_new can be interpreted as the 'effective half-life' of the posterior and will be our choice for the new λ parameter.
    • Plug the posterior's first moment and λ_old into this equation to calculate λ_new =def. - λ_old / log_2 E[postr P_recall@λ_old].
  5. With our suitable new half-life chosen, move the posterior to λ_new, posterior P_recall@λ_old ~ (another complex distribution with analytical moments), which should be approximately central.
  6. Moment match a Beta distribution to give us posterior P_recall@λ_new ~approx. Beta(α_new, β_new), which we take as our new memory model.

Note: some of those steps are conceptual rather than computational; pseudocode illustrating the computational steps is below (all in log-space, of course):

def updateRecall(result r, time elapsed t, prior model parameters θ):
    # Steps 1--2 are captured by assumption within the parameters
    _, _, λ_old = θ

    # Steps 3 and 4:
    # calculate the posterior after update at time t, shifted back to time λ_old
    postr_λ_old = analytic_posterior(r, t, λ_old, θ)
    # use this posterior mean and the exponential decay approximation as
    # a heuristic to determine a new half-life
    ln_μ_λ_old = postr_λ_old.ln_moment(1) # ln here denotes natural logarithm
    λ_new = - λ_old * ln(2) / ln_μ_λ_old 

    # Step 5. Compute the moments of the posterior at time λ_new
    postr_λ_new = analytic_posterior(r, t, λ_new, model)
    ln_m1 = postr_λ_new.ln_moment(1)
    ln_m2 = postr_λ_new.ln_moment(2)
    mean  = exp(ln_m1)
    var   = exp(ln_m2) - exp(2*ln_m1)
    # Step 6. match with a beta distribution to fit our new model
    α_new, β_new = beta_match_moments(mean, var)
    return (α_new, β_new, λ_new)

Results

I have experimented by performing these Bernoulli updates at 0.0001, 0.01, 1, 100, and 10000 half-lives, for both failed (successes=0) and passed (successes=1) trials. The initial half-life was exactly 1 (so t = δ).

I tried the following four update schemes for comparison:

  1. λ_new = λ_old (no rebalance): Do not attempt to rebalance, just force fit a Beta distribution to the posterior, no matter how unequal alpha and beta are.

    λ_new = λ_old (no rebalance)
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.665871    3.093668      1
    fail    1e-02        1.670583    3.093355      1
    fail    1e+00        2.000000    3.000000      1
    fail    1e+02        2.004093    2.006296      1
    fail    1e+04        2.000000    2.000001      1
    pass    1e-04        2.000100    2.000000      1
    pass    1e-02        2.010000    2.000000      1
    pass    1e+00        3.000000    2.000000      1
    pass    1e+02      102.000000    2.000000      1
    pass    1e+04    10012.410073    2.002081      1

    OK, no question, this strategy is Really Bad, obviously. The half-life parameter loses all interpretability and, as I think you mentioned elsewhere, moment-matching would probably be pretty inaccurate for such extremely skewed posteriors.

  2. ebisu (AUTO rebalance): The current implentation: Do (1) unless alpha and beta differ by a factor of two or more, then use coarse grid-search to find a new half-life.

    ebisu (AUTO rebalance)
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.665871    3.093668      1
    fail    1e-02        1.670583    3.093355      1
    fail    1e+00        2.000000    3.000000      1
    fail    1e+02        2.004093    2.006296      1
    fail    1e+04        2.000000    2.000001      1
    pass    1e-04        2.000100    2.000000      1
    pass    1e-02        2.010000    2.000000      1
    pass    1e+00        3.000000    2.000000      1
    pass    1e+02        1.328794    2.075719     61.566292
    pass    1e+04        2.589229    2.032667   3361.405627

    Behaving as expected. Interestingly, there is never a need to rebalance for fail trials even for extremely quick fails.

  3. ALWAYS ebisu-rebalance: Always use coarse grid-search to find a new half-life.

    ALWAYS ebisu-rebalance
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.430187    3.132588      1.127626
    fail    1e-02        1.434299    3.132170      1.127626
    fail    1e+00        1.725811    3.034261      1.127626
    fail    1e+02        1.749764    2.017428      1.127626
    fail    1e+04        1.745926    2.010629      1.127626
    pass    1e-04        1.746014    2.010628      1.127626
    pass    1e-02        1.754725    2.010569      1.127626
    pass    1e+00        2.627134    2.006456      1.127626
    pass    1e+02        1.328794    2.075719     61.566292
    pass    1e+04        2.589229    2.032667   3361.405627

    Also behaving as expected. Interestingly, the grid search finds an increased half-life parameter as the best fit even for failed trials. Of course, the effective 'half-life' determining scheduling probably still goes down, and this is probably captured through appropriate changes in alpha and beta.

  4. always APPROX. rebalance: Always use the approximation discussed above to find a new half-life.

    always APPROX. rebalance
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        2.763985    2.994630      0.660264
    fail    1e-02        2.765495    2.994940      0.661462
    fail    1e+00        2.790686    2.936193      0.756471
    fail    1e+02        2.005878    2.006227      0.999208
    fail    1e+04        2.000001    2.000001      1.000000
    pass    1e-04        2.000019    2.000003      1.000036
    pass    1e-02        2.001878    2.000299      1.003606
    pass    1e+00        2.135892    2.018352      1.356915
    pass    1e+02        2.487732    2.034483     35.695958
    pass    1e+04        2.501217    2.034258   3466.775981

    This behaviour is, I think, really neat! So we see that for the cases where the existing ebisu algorithm (above) decided to rescale the half-life, this approximate method got a similar value (e.g. 3361 v. 3466, 61 v. 35, about 1 v. about 1). In a few notable cases, I think it did a 'better' job, especially in that the half-life actually goes down for failed trials. Perhaps most importantly, the changes to the alpha and beta parameters at the new half-life are pretty contained, i.e. the beta distribution has stayed very well balanced. More on this in the following plot.

Finally, here's a plot of some of the posterior Beta distributions at their new half-lives after these updates, for the various algorithms. Of course, the no-rebalancing scheme is useless, and the ebisu rebalancing scheme works pretty well. But here you can see visually how the new approximate balancing scheme looks even a little more centered at its half-life than the current ebisu rebalancing scheme.

balance

Remaining issues

fasiha commented 3 years ago

First, thank you so much for spending so much time and attention on Ebisu, on coming up with this approach, and for writing it up in such detail! I am very grateful!!

Second—yes! I like this!!! I am confident you have found a much better alternative to the rebalancing that we currently do—we can replace the very ad hoc check for rebalance (0.5 < α / β < 2) and the coarse grid search with a very reasonable and self-explanatory alternative.

And as you note, it will give us a cheap and good approximation to start the fine search for the true halflife (that gives us posterior mean of 0.5) to properly do #31.

I'm also confident it will be straightforward to incorporate this approach with the binomial quiz case.

Are you interested in a version of Ebisu that does #31 (always rebalances), using this approach? If so, I can try to focus on that plus some other outstanding issues that are mostly done and release a new minor version—alas due to life commitments, I'm not sure when I'll be able to give this the time it needs. But it sounds like you have a version that is working for you, and while, yes, it may be possible that we need a fallback in case this algebraic approach to approximating the halflife fails, I think barring pathological cases it should work just fine.

Again, many, many thanks for your hard work!!!

matomatical commented 3 years ago

Right, I have my own implementation, which I will continue to use for my memory exercises. So there's no pressure from me to get this into a new minor version. Please consider this issue a 'feature suggestion' rather than an 'issue' or urgent feature request.

In the mean time, I'll continue to use (my implementation of) the ebisu algorithm for my memory exercises and I can update you once I have more data on how this heuristic updating scheme performs, or if it destroys my memory models, or if I come across any other suggestions.