fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
314 stars 32 forks source link

AssertionError #26

Closed tezer closed 4 years ago

tezer commented 4 years ago

Hi I have an error message, when I use a certain combination of parameters:

2020-04-17 22:46:48.907 | DEBUG    | __main__:update_model:49 - prior = (3.0, 3.0, 4.0), successes = 1, total = 10, tnow = 0.1
.../venv/lib/python3.7/site-packages/scipy/special/_logsumexp.py:120: RuntimeWarning: invalid value encountered in log
  out = np.log(s)
Traceback (most recent call last):
...
  File "../spaced_repetition/processor.py", line 50, in update_model
    new_model = ebisu.updateRecall(prior=model, successes=result, total=10, tnow=time_passed)
  File ".../venv/lib/python3.7/site-packages/ebisu/ebisu.py", line 119, in updateRecall
    assert m2 > 0, message
AssertionError: {'prior': (3.0, 3.0, 4.0), 'successes': 1, 'total': 10, 'tnow': 0.1, 'rebalance': True, 'tback': 4.0}

It looks like it fails when the difference between successes and total is > 7: it crashed with successes=0 and total=8 and successes=1 and total=10 it all happens at tnow=0.1

If I increase tnow, the limit goes up: at tnow=0.2 it crashes at successes = 0, and total=11 at tnow = 10.2, at successes = 0, and total=64, but runs ok if successes = 2, and total=64

fasiha commented 4 years ago

Yes, this is too much of a surprise for the algorithm to handle 😢. I noted this in the docstring for updateRecall:

  N.B. This function is tested for numerical stability for small `total < 5`. It
  may be unstable for much larger `total`.

  N.B.2. This function may throw an assertion error upon numerical instability.
  This can happen if the algorithm is *extremely* surprised by a result; for
  example, if `successes=0` and `total=5` (complete failure) when `tnow` is very
  small compared to the halflife encoded in `prior`. Calling functions are asked
  to call this inside a try-except block and to handle any possible
  `AssertionError`s in a manner consistent with user expectations, for example,
  by faking a more reasonable `tnow`. Please open an issue if you encounter such
  exceptions for cases that you think are reasonable.

Are you using such large totals to try and approximate a fractional recall strength? Unfortunately that's probably not going to be possible due to these numerical issues. I recommend using total<=3, even if your quiz really does have more trials than this.

This is unfortunately the best I could do with 64-bit floating point math (though likely there are improvements I don't know about). While I could use arbitrary-precision numbers via the very nice mpmath library, I hesitate to do this because most other languages do not support arbitrary-precision arithmetic or special functions.

fasiha commented 4 years ago

I'll close this, feel free to reopen if you have further questions or if I'm not being clear!

fasiha commented 4 years ago

I noticed in your stacktrace that tnow=0.1 and tback=4.0 (the last element of the model) and thought to experiment with explicitly setting the tback=tnow: this argument tells the update function what time horizon you want the updated model to be for.

So:

In [13]: ebisu.updateRecall((3.0, 3.0, 4.0), 1, 10, .1, rebalance=False, tback=.1)
Out[13]: (1.9194897423560657, 0.43238986987746275, 0.1)

In [14]: ebisu.updateRecall((3.0, 3.0, 4.0), 1, 10, .1, rebalance=True, tback=.1)
Out[14]: (0.8319919171517688, 0.6407240193877811, 0.833209151552077)

This seems to work because setting tback=tnow lets the algorithm work with smaller arguments to the betaln function, preventing overflow. In the first example above, I set rebalance=False, which will ask the updater to return the updated model at a new time of tback, just to confirm that it wouldn't throw an exception. The second example, with the rebalance=True, allows it to move the updated model time to something closer to the model halflife.

But note that in both updated models, the first or the first two parameters of the model (alpha and beta) have gone < 1, which lead to improper Beta distributions (where the probability distribution becomes U-shaped (bimodal) instead of upside-down-U shaped (unimodal)). Trying to use these models will probably result in future incorrect behavior 😔. For example, the two models disagree in half-life:

In [15]: ebisu.modelToPercentileDecay(ebisu.updateRecall((3.0, 3.0, 4.0), 1, 10, .1, rebalance=False, tback=.1))
Out[15]: 0.6584816172123262

In [16]: ebisu.modelToPercentileDecay(ebisu.updateRecall((3.0, 3.0, 4.0), 1, 10, .1, rebalance=True, tback=.1))
Out[16]: 1.1267087826740922

In contrast, using more reasonable total=3 gives models that are truly equivalent after rebalancing: I see the same halflives here:

In [19]: ebisu.modelToPercentileDecay(ebisu.updateRecall((3.0, 3.0, 4.0), 1, 3, .1, rebalance=False, tback=.1))
Out[19]: 2.2700374362921734

In [20]: ebisu.modelToPercentileDecay(ebisu.updateRecall((3.0, 3.0, 4.0), 1, 3, .1, rebalance=True, tback=.1))
Out[20]: 2.269294323549595

So in this situation, even using arbitrary-precision arithmetic won't help—the algorithm is so surprised by the student failing nine times out of ten, only 0.1 time units after the last review, when it's model expected a half-life of 4 time units, that it goes into an incorrect part of the parameter space 😤. There's unfortunately no fix for this—at best I may be able to detect when this happens and throw a ValueError (instead of AssertionError on numerical overflow/underflow) but you'd have to handle that.

Before Ebisu 2.0, total=1, i.e., Ebisu only dealt with binary quizzes ("Bernoulli experiments" is the statistical term). It was relatively straightforward to extend it to total>1 ("binomial experiments"), so I added it. The quiz model for binomial quizzes is that you have quizzed the user total times in a single quiz session, without giving them feedback, so their performance on each quiz is statistically independent of the others. Is this the case you are using? Or are you trying to use success and total to approximate a 'quiz strength'?

I ask because we don't have a lot of practical experience with this Ebisu 2.0 binomial quiz style (total>1) and your feedback would be most helpful.

Again, please feel free to follow up with questions, I'm not sure how familiar you are with Ebisu or the underlying statistics so the above explanation might have been too opaque, I'm happy to elaborate.

tezer commented 4 years ago

Thank you for the elaborate answer! Really appreciate your work and attitude. I am just exploring Ebisu and am panning to use it in one of my projects, that personalize learning.