fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
312 stars 32 forks source link

Ebisu inertia #59

Closed Kaelorn closed 2 years ago

Kaelorn commented 2 years ago

This issue is meant to discuss about the "inertia" Ebisu can have through one recall prediction to another.

By "inertia" I mean the ability of the model to keep increasing the recall duration even when the success is below 0.5 (I'm using soft binary quiz feature), or the ability of the model to keep decreasing the recall duration even if the success is over 0.5

One example of what I call "inertia": Let's say there's a flash card, and let's say a user sees this flash card and remembers it 5 times in a row (success=1). Each time the user sees the flash card at the exact moment the scheduler predicted it. The parameters of the model are (3., 3., 24.) and the percentile is set to 0.5. We can see the evolution of the recall duration:

0: 24 1: 30.434447773736824 2: 38.49837036471472 3: 48.57990134320228 4: 61.153703703272676

Here everything seems OK, each time the user remembered the flash card so I expect the model to postpone the next review session. However if the user fails to remember at iteration 5 (success=0) the predicted recall duration is 71.12178019991727... Which is surprising since it is a longer duration than the previous one while the user failed to remember the flash card. If the user fails one more time the new computed recall is 78.50939997707633 (even higher). It takes around 5 iterations of fails before the model begins to decrease the duration.

This problem exists both ways (when the user fails and then succeeds or when the user succeeds and then fails)... This problem is even bigger when I try the Monte Carlo version

I tried different workarounds to break inertia. One solution I found convincing was to adapt the percentile to prevent inertia when it is detected. I consider there is inertia when success > 0.5 and new_duration < previous_duration or when success < 0.5 and new_duration > previous_duration. Then when there is inertia I update the percentile following this formula:

p' = sigmoid(logit(p) + k * ln(d'/d))

With p' the new percentile, p the previous percentile, k a constant, d' the new predicted duration and d the previous predicted duration. I did not do any strong demonstration to create this formula, I was just looking for a formula that seemed to work with my use cases. With k=2 I have not so bad results...

For example for the example above at iteration 5 instead of 71h of duration I have 55.421436638909405 which is below the 61 hours of iteration 4 (which is, I suppose, expected when you fail the quiz).

But maybe I am using Ebisu in a rude and brutal way and maybe there is a built-in feature to avoid this inertia...

So I have a set of questions:

fasiha commented 2 years ago

Thanks for the note! Hmm, my code below agrees with your numbers for the first five successes, but then on the sixth review (which is a failure), I see the half-life decrease, and continue to decrease after another failure:

import ebisu

model = (3., 3., 24.)

for i in range(6):
    t = ebisu.modelToPercentileDecay(model)
    result = 1 if i < 4 else 0  # success first few times then failure
    model = ebisu.updateRecall(model, result, 1, t)
    halflife = ebisu.modelToPercentileDecay(model)
    print(f'1. {result=}, half-life: {halflife:.2f} hours')

prints out the following:

  1. result=1, half-life: 30.43 hours
  2. result=1, half-life: 38.50 hours
  3. result=1, half-life: 48.58 hours
  4. result=1, half-life: 61.15 hours
  5. result=0, half-life: 49.41 hours
  6. result=0, half-life: 41.80 hours

Can you share your code that showed you 71.12178019991727 and 78.50939997707633? This should not happen, there are actually unit tests in Ebisu that check a wide variety of models to confirm that successes (given a binary quiz) only raise the half-life and failures only decrease them. (One of the things I'm struggling with with Ebisu v3 (described in #58) is whether I want to keep this guarantee 😅.)

Kaelorn commented 2 years ago

Ok that explains a lot. The misunderstanding seems to come from the definitions of total and result

I thought total was defined by the "number of times it showed this flashcard" (according to the notebook) so I supposed I had to keep track of the global number of times the flash card was shown and therefore the sum of the previous results for that flashcard... But here it seems total and result only take into account the immediate result. If I understand well, total seems to be the maximum number of points that could possibly be won in the last revision and success the number of points won in the last revision. It explains the inertia since it takes a few iterations of fails or successes to influence the global result average

fasiha commented 2 years ago

total seems to be the maximum number of points that could possibly be won in the last revision and success the number of points won in the last revision

Ah this might be a good way to see it. Initially, Ebisu only supported binary quizzes, so updateRecall just took a single boolean quiz result. Then Duolingo published a very interesting dataset where, in a single quiz, a fact could be tested multiple times. And statistically, this perfectly matched a generalization of the binary quiz, which is the Binomial quiz (k success out of n trials, very much like your points analogy). I'll take a look at the docs and see where I can make this clearer, thanks for bringing it to my attention!