fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
314 stars 32 forks source link

New cards #11

Closed mustafa0x closed 5 years ago

mustafa0x commented 5 years ago

What's a good way to deal with new cards (i.e. that haven't yet been reviewed)? How to schedule them when sorting the deck?

fasiha commented 5 years ago

You have no choice but to omit cards lacking models from the sorted search, right? So I schedule those in the order that they’re listed in my database (flat file, time stamp added, etc.)—Ebisu doesn’t get involved with that aspect.

mustafa0x commented 5 years ago

Right.

So if the user reviews four cards, and gets half of them right, should all four be moved to the "bottom" of the stack, below the cards that have no model (so we can't predict their recall)? How about if the user leaves the app, and comes back in two weeks?

Or am I overthinking this?

fasiha commented 5 years ago

Maybe :) I don't keep a stack—each time the user asks for a quiz, I call predictRecall on all the cards with models and find the card with lowest recall probability. This becomes important as you get a mix of old and recently-learned cards—the newly-learned cards' recall probabilities decay much faster than those of old ones, so cards pass each other on that notional stack.

I have various ideas how to ameliorate the computational waste of scanning over all learned cards each time you want to quiz—the most straightforward one being, when you update a card's model, you also calculate and store the timestamp at which the recall probability will be 50%, 5%, 1%, 0.1%, 0.01%, and do some SQL or binary search magic to find the set of cards with approximately the lowest recall probabilities given the actual timestamp, and pick one of those to quiz.

(Ensuring you quiz on the card with the absolute lowest recall probability is not that critical. You want to avoid patterns in quizzes—like, if you learn A, then B, then C, and each time you're quizzed on A you know you'll be asked B and C next, that's bad. So bucketing quizzes into approximately the same recall probability is nice from a user perspective and a computational one.)

Does this all make sense? This is how Curtiz works.

fasiha commented 5 years ago

In your example, if you learn/review four cards, potentially failing some reviews (you never 'fail' an initial learn right, you just set the model of a newly learned card to defaultModel), and have other cards you haven't learned, then come back later, you run predictRecall on all four cards to find one with low (maybe lowest) recall and quiz that, then rinse and repeat until you get tired of quizzing the same four cards and learn more cards.

mustafa0x commented 5 years ago

Thank you so much for the explanation. My app's scheduling workflow however still has glaring holes.

Things are clear cut when all cards have been reviewed at least once — simply sort the deck of cards with predictRecall. Similarly, when all cards are new, just present them in whatever order the app has them stored in.

The intermediary states however are a bit more confusing to me.

Thank you, and I apologize for the barrage of questions! 🙃

fasiha commented 5 years ago

No worries!

Perhaps something I've not explicitly noted is the choice you have in mixing quizzing vs learning. In Curtiz, those are two separate modes of usage: when quizzing, you only look at the flashcards you've already learned and quiz on the lowest recall probability ones, over and over; and when learning, Curtiz introduces flashcards that you haven't seen. In this way, Curtiz actually partitions the set of cards into learned vs unlearned, leaving you with exactly the two situations you described.

But you can do something else too—you could set a threshold for memory recall, say 50%, and quiz until all learned cards have predicted memory recall >50%, and then switch to learning new cards.

Does that make sense? I'll answer your specific questions now—

Why can't an initial study be marked as fail? Look at it this way: for any flashcard, whether newly learned or an old one, your quiz app stores (1) an Ebisu model and (2) a timestamp for which that model applies—if a, b, t = model, you're saying that recall at time timestamp + t is Beta(a,b)-distributed. The model has to come from defaultModel or from updateModel, the two functions the Ebisu API provides, but updateModel takes a model as an argument, which for new cards, you lack. So when your user indicates they've committed a card to memory, that's when you store the timestamp and a defaultModel, to encode your belief in their ability to recall.

Your app could certainly adopt the Anki mechanism you describe—

but in terms of the API Ebisu provides, the first time your user indicates they've learned a flashcard, that's not a quiz that can be graded (since you don't have an existing Ebisu model).

It is possible to simply select a default half life that is sensible Absolutely. In Curtiz, we used to do exactly this: all newly-learned cards had default half-life of 15 minutes. But then we allowed users to scale that manually if they want to, because it's frequently the case that you are much more familiar with some cards than others. You still want the SRS to remind you of something you know well, just not with the same frequency as something you've never seen before.

Should a defaultModel be assigned to a card before or after the first time it's studied? Definitely when the user indicates they've committed it to memory—so at the first time it's studied, and certainly not before. Recall that your app needs to store a model and timestamp for each learned card, because the memory decay encoded by the model starts at the timestamp. After you quiz, call updateRecall, and update your model with its return value and the quiz's timestamp. The new memory decay begins then. Before a card has been studied, it's nothing to Ebisu.

You said: "rinse and repeat until you get tired of quizzing the same four cards and learn more cards." — How could this be where the app determines whether to present a recently studied card or an entirely new one? I tried to answer this above: if your app's design interleaves quizzing and learning, then you'd set a minimum threshold on recall probability, and if no flashcards are below that, you ask the user to learn new things. But this is totally your app's decision and not something Ebisu cares about. Ebisu tries (I hope!) to be unopinionated about everything about your app, which is why there's just the three functions it exports:

Aside You can also update cards passively, when you don't have a "quiz" as such. A simple example might be cloze-deleted card: (Wellington) is the capital of (New Zealand) has two Ebisu models, one for each cloze. When you first learn the flashcard, both models get set via defaultModel. And you can only pick one to quiz, so you quiz "___ is the capital of New Zealand". Now, you go to update Wellington's Ebisu model with updateRecall—this is totally normal. But what about New Zealand's model? Certainly by doing this quiz, you've also practiced "New Zealand" too, and it's wrong to pretend it's memory is decaying just as before. So this is a passive review for New Zealand: you just update New Zealand's timestamp to be the same timestamp as the quiz, thereby resetting its decay. (You don't modify it's model, since you have no evidence that the user would have remembered or forgotten it if it was an active review and they had to summon "New Zealand" from their memory.)

What this means is that, assuming the user answered "Wellington" correctly, Wellington's memory is strengthened and decays more slowly than before, from the timestamp of the quiz. But so too does New Zealand's memory: it's refreshed and decaying anew, with its original model's rate of decay.

This is really nice because it can greatly reduce the burden of review. I'm working on an app that builds a dense dependency graph between Ebisu models: "when I review this sentence, I'm also indirectly reviewing all the vocabulary in the sentence; the overall sentence's Ebisu model gets updated via updateRecall but all the individual vocab cards' models' timestamps also get reset."

This only makes sense if you see the Ebisu model as modeling memory decay starting from a timestamp. I don't think there's an equivalent to this notion in the Anki methodology so I'm happy to explain this (or any other aspect) further, don't hesitate to ask.

fasiha commented 5 years ago

(Note, I edited the previous comment on GitHub for clarity and apologize that the original in the email may be a bit incomprehensible in some places.)

mustafa0x commented 5 years ago

Thank you for the thorough explanation, the threshold idea seems to be working decently. 🎉 I'll keep you updated of any new developments.