Open fasiha opened 5 months ago
@fasiha I have developed a free online version of Flashcards available at https://itoytoy.com/anki I plan to use ebisu 3.0 and will regularly sync the review data from users' cards with you for further optimization of ebisu.
I have previously used ebisu 2.1 in my product, but feel that its potential has not been fully utilized in practical applications. After integrating ebisu 3.0, should any issues arise, I will consult with you for guidance.
Thank you.
This dev diary is the third open proposal for Ebisu v3:
Both the above techniques share two nice desiderata:
But there's another desideratum:
Unfortunately, both the ensemble and the Beta-power-law approaches mentioned above fail miserably on this third requirement.
Code to generate the table below
Starting with https://github.com/fasiha/ebisu/tree/v3-release-candidate run this in the top-level directory: ```py import ebisu m1 = ebisu.initModel(100) for i in range(20): print(i, ',', ebisu.modelToPercentileDecay(m1 := ebisu.updateRecall(m1, 1, 1, 100))) ``` and this in the `scripts/` directory to access the `betapowerlaw.py` script: ```py import betapowerlaw as bp m2 = [1.25, 1.25, 100] for i in range(20): print(bp.modelToPercentileDecay(m2 := bp.updateRecall(m2, 1, 1, 100))) ```I can explain why both models have this flaw:
These two failure modes are independent and made me think about ways to circumvent both while keeping the other two desiderata listed above.
Here's where I ended up.
Consider a simple 3-atom ensemble with fixed weights (i.e., the weights don't change, so it's quite a stretch to call it an "ensemble"):
betapowerlaw
model proposed in the previous dev diary; this model is also never updatedHere's the idea: the primary atom is just an Ebisu v2 atom, so it's conservative: it evolves slowly and therefore is less vulnerable to the halflife growing dramatically after repeated quizzes on the same time interval. The second atom allows this model to circumvent the conservativeness of Ebisu v2: it explicitly posits that memory can strengthen organically and its halflife is pegged to twice (or N×) the first atom's halflife: this meets our second desideratum of realistic halflife growth after quizzes, and that's why it never needs updating. Finally, the third atom (the power law) makes explicit the chance that exponential decay is just wrong for this memory and captures the odds that without study the student will remember this fact for a year. This achieves the first desideratum of respectable predicted recall probabilities, and similarly doesn't need updating: it just exists to prop up the recall probability at long intervals.
Here are the halflives for the three proposals after twenty successful quizzes each 100 hours apart, as well as how much bigger this halflife is than the starting halflife: the last column, the split approach, shows unbounded growth of the halflife but much slower. After twenty iterations, it's still 7× the starting halflife, versus 17 (ensemble) and 600 (Beta power-law):
After some tweaking of the parameters of this model, we find that it's very competitive with the ensemble and the Beta-power-law approaches:
*Dev instructions to generate this plot*
To obtain this plot, 1. create a venv or Conda env, 2. install dependencies: `python -m pip install numpy scipy pandas matplotlib tqdm ipython "git+https://github.com/fasiha/ebisu@v3-release-candidate"`, 3. then clone this repo and check out the release candidate branch: `git clone https://github.com/fasiha/ebisu.git && cd ebisu && git fetch -a && git checkout v3-release-candidate`, 4. download my Anki reviews database: [collection-no-fields.anki2.zip](https://github.com/fasiha/ebisu/files/13405477/collection-no-fields.anki2.zip), unzip it, and place `collection-no-fields.anki2` in the `scripts` folder so the script can find it 5. start ipython: `ipython` 6. run the script: `%run scripts/split3.py`. This will produce some text/figures.Compare to the ensemble approach:
and the Beta-power-law results:
Indeed, for the first half of the graphs above (the flashcards for which I had a lot of failed quizzes), this "split-3-atom" model outperforms the two alternatives.
When I initially sketched this split-3-atom model, I thought the first atom would have a lot of weight, like 80%, and the next two atoms would have 10% each. Turns out that an equal split works the best, one-third weight for each. There also appears to be some advantage to scaling the second atom to 5x the first atom's halflife instead of 2x in terms of focal loss performance, but we'll have to see if that's "real" or just the loss function being weird.
As usual, I'm going to stew over this and poke around the text file generated by the script above that delves into the predictions made for each model for individual quizzes per flashcard. But I'm tentatively excited about this model. It's lacks the mathematical elegance of the Beta power-law model and needs more parameters (specifically, the weights and the halflife-scalar for the second atom), but so far I like its behavior a lot.