fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
312 stars 32 forks source link

Support for partial reconsolidation #51

Open jasonsparc opened 3 years ago

jasonsparc commented 3 years ago

Most SRS implementations assume that a full reconsolidation happens after review, regardless of whether the review passed or failed. That is, the probability of recall/success is always restored back to 100% even after a failed attempt.

However, it's possible that only a partial reconsolidation happens whenever the user fails, and that a full reconsolidation is perhaps only possible after a "successful" attempt via a refresher (e.g., via an Anki-style relearning step) or a follow-up review in the future.

What I mean by "partial reconsolidation" is that the probability of recall/success isn't restored to 100%, but rather somewhere below. In the event of zero reconsolidation (or complete lack of reconsolidation), the probability of recall/success is the same as before, as if the review didn't happen at all. And if so, we can assume that the date of 100% recall (or 100% probability of success) is still in the past (i.e., it's still the same as before, prior to today's review, as if today's review didn't happen). If a partial reconsolidation happens instead, the date of 100% recall is somewhere in the past as well but could be much later than before: much later than the prior review's date but earlier than today's review.

So when there's a partial or zero reconsolidation currently in place, and we used the algorithm to update the model, it would be as if there's some kind of inherent interval in addition to the current interval since last review. That is, to simulate a less than 100% recall even after the review happened today, it would be as if the 100% recall is in the past, at an interval equal to the inherent interval.

So in essence, suppose X days have passed since the user reviewed with a failed attempt, then there will be Y days of inherent interval, to accommodate for the partial/zero reconsolidation that might have taken place. When updating the model with the X interval, what the algorithm actually sees is that X+Y days have passed (instead of just X days) – and that is why I call the Y an inherent interval. The inherent interval is only cleared (or becomes completely zero), once a full reconsolidation happens. If instead, a zero reconsolidation happened in the last review, the inherent interval will be exactly the same as the interval inputted in that last review.

Partial reconsolidation can therefore be simulated by introducing some kind of inherent intervals and combining that with Ebisu's soft-binary fuzzy quiz feature. Also, we can view the act of "displaying the correct answer after a failed attempt" as some kind of passive review, with a weak influence on the probability of successful active recall: hence, passively seeing the correct answer contributes only a partial reconsolidation. A zero reconsolidation is even possible if the user simply glanced at the correct answer after a failed review, without internalizing the correct answer, or without a follow-up refresher.

Partial reconsolidation is also possible if an item partially influences the recall of another item, either through some kind of strengthening or interference, i.e., a model correlation. For interference, a successful recall attempt of a certain item could be a partial failed recall attempt on another item – and I think in most cases, only the fuzzy quiz feature is necessarily involved here, but if the item reminded of the other item's answer after a failed recall of the latter, then the inherent intervals might be needed to be involved also.

Now, I lack the mathematical background regarding the feasibility of this feature, and also, I don't know if this is a good feature at all. Nonetheless, I believe this is a handy feature with very niche applications, especially for the implementation of model correlations (as that is the primary motivator that made me think about all of these).

jasonsparc commented 3 years ago

An alternative to a full-blown "partial reconsolidation" support, is to just assume that whenever a review fails, a zero reconsolidation happens. Now afterwards, if the next review is a successful review, then we can pretend instead that the last review wasn't a zero reconsolidation, but a full reconsolidation. That way, we don't accidentally boost the half-life too much (due to the current probability of recall/success possibly being now too low). Also, that way, consecutive failed reviews (in close succession) don't lower the half-life too much.

Now, what about if 2 or more consecutive failed reviews happen followed by a successful review? We simply move the date of 100% recall forward to the most-recent failed review, pretend it was a full reconsolidation, then perform the update for the successful review.

And actually, that is my plan for my own quiz app, if using Ebisu as is for now.

(Though, I hope everything I have written so far is comprehensible enough.)

fasiha commented 3 years ago

Hello hello! Good to hear from you again! Thanks for writing such a thoughtful and interesting issue!

This is especially serendipitous because I have been talking to someone about a semi-related issue: on a video/text-for-language-learning website, suppose a user encounters a flashcard word in a video/article on the app. They have the option to click on it to review it, which would count as a failed review. But they might not click on it, in which case, what do we do?

  1. I think one could argue that that counts as a successful review, and the risk of overinflating the memory model is worth not annoying the user with a flashcard review soon after they encountered it in native media.
  2. But it also seems reasonable that you'd want to model this as a "partial" or "passive" review (maybe the content was so engrossing the user didn't want to break their flow, etc.), or you really don't want to risk overstrengthening the model.

So we're talking about using noisy quizzes and q0 to model this, but with an extra twist. Before describing the twist, note that any app using Ebisu likely has a row in a database for each flashcard with a column for model and for lastQuizTime.

That is, after "passive successful" review, you have a new Ebisu model (slightly strengthened from the original) but it applies to the time horizon starting at the last normal flashcard quiz, not at this passive quiz you just had.

With that fresh on my mind, I think it makes a ton of sense to also think about partial reconsolidation. I'm actually in the middle of a major rewrite of Ebisu (long story, see the pinned issue) so this is the perfect time to think about how to explicitly model passive reviews and partial reconsolidation, and what a good API would be for these.

Soooooo.

What do you think about this: suppose you have a flashcard you reviewed at datetime T1. Then, at T2, you have another review. Then, instead of calling updateRecall with just successes and total and optionally q0, you can also give it an optional argument 0 <= reconsolidation <= 1.

I'm not sure if this will fill the shortcoming you see currently—I think what you call the "inherent interval" is referring to my twist of carefully choosing whether to update the flashcard's lastQuizTime column in the database or not.

Even if this does satisfy you, I'm not sure I like broadening the API this much, to include this very powerful number which can radically change things for this flashcard. Mayyybe we let reconsolidation be boolean, i.e., support only zero or full reconsolidation, which amount to leaving the flashcard's lastQuizTime as T1 vs updating it to T2 in the example above? I think this can support all the use cases we care about:

Do we want to imagine a nosy enough quiz app that asks a user for an arbitrary 0 < reconsolidation < 1? Maybe a teacher grading a student might want to assign this? Seems unnatural?

Let me know if this would support your use case properly, and if it'd be an improvement over the ad hoc algorithm you describe above ("just assume that whenever a review fails, a zero reconsolidation happens").

And let me know if I totally misunderstood you, with my apologies!

jasonsparc commented 3 years ago

Actually, I guess it would be of benefit if I describe a little about my SRS app.

In my app, I'm planning to have support for passive reviews. People can create a "passive review" card, then on review time, the user would read, or listen, or watch a video, etc., (depending on the card's contents) then once they're done, the user must grade the passive review card. The possible grades are as follows:

Just ignore the "Interesting" part for now: what really matters are the "Feels New" and "Feels Old" parts. The latter indicates a success in recognizing that the passive review card seen (read, watched, or listened to) feels "generally" old, and the former is the reverse.

Now, I said "generally" because I want the user to focus on the general overall feeling of the content, and that a single word shouldn't define whether or not something feels new or novel enough to be graded as "Feels New" – I forgot where I learned this but the surrounding text/content should already serve as some kind of "neural hook" to help that "single word" to be remembered/recognized next time. If the user deems that single word as too important, they should instead make a separate "active recall" flashcard about it.

So the idea is that, a passive review card is some kind of recognition test (rather than a recall test).

The goal of my app should be to maximize the amount of "Feels New" cards for study, while also minimizing the chance that any should be studied much earlier or reviewed more often due to forgetting. After all, a lot of "Feels Old" cards in a study session will definitely feel like a boring study session (or a chore to go through).

Now, this only works for very short snippets or small notes as passive review cards. What if we also support passive reviews of very long texts or larger notes? Some kind of graded "Incremental Reading" feature?

If you've ever used SuperMemo, its "Incremental Reading" feature is something that helps the user to gradually finish a very long article for example, but it lacks a "true" passive review algorithm. SuperMemo has "Topics" (which are like my app's passive review cards) but they're not graded, you have to manually tune the frequency of how often they should reappear for review.

So a few months ago, I spent a few weeks brainstorming with ideas to support this feature. I first ran into the problem that a large passive review card has to be broken somehow, and each broken part has to be graded, before finally merging all that into a single passive review grade. Now, each broken part wouldn't necessarily have the same proportion or share in the final result. So there's also the problem that there has to be some kind of measure as to how much each broken part should contribute. I needed a way for my users to do all of this in a not so tedious way. Add to that the fact that I also want to support audio and video presentations, and not just static text for reading.

I then got an idea after I remembered my "stopwatch lap timer" app on my phone. So the idea is that, a single word or short passage may need more mental processing than a very long (perhaps also unimportant) passage. (That short passage could be a math equation for example, and the long passage is just its description in words.) So, what if we break a single presentation of a passive review card into "laps" (as in a lap timer)? Each lap will be graded, and each lap has some span of time, indicating how long it took the user to mentally process that lap.

Once the user finishes the passive review card in its entirety, they mark it as "complete" (via a dedicated button), then each "lap" will contribute a grade that would affect the "general" grade of the passive review card. Now, the span of time that each lap possesses will be the measure of each lap's share/proportion of contribution to the overall/general/final grade. Additionally, there's also the fact that, each lap has varying intervals from the last review time (or last graded lap).

Now, I know this seems ad hoc as well, but this "lap times" is the best measure I could come up with that would easily support not just text, but also, video, audio, and even any interactive media, when it comes to breaking it all into multiple graded parts.

So as you've said,

Do we want to imagine a nosy enough quiz app that asks a user for an arbitrary 0 < reconsolidation < 1? Maybe a teacher grading a student might want to assign this? Seems unnatural?

That is indeed unnatural, especially if the student or the teacher would have to provide this manually. But, if the teacher is the app, that "reconsolidation" parameter could simply be provided via how large a lap is over the total amount of time accumulated that was spent on completing the passive review card. So in my app, there's those grade buttons, and pressing them creates a new lap: they could read a passage or so, feel like it's new, then simply press the "Feels New" button to grade that lap – that's how fast the user can make a graded lap.

In addition to all of that, a user may shelve a passive review, leave it hanging and incomplete, and continue reading/studying it for later. The passive review card could be an entire PDF book for example, and it may take a year for the user to finish this book. And if the user took a year to finish the book, I think it's fair to think that the initial model's half-life should be an entire year as well (but this part is a different issue and I'll probably just come up with another ad hoc solution for this first before telling you my progress next time).


Now, the idea for the "inherent interval" came to me so that every time the user reviews, the last quiz datetime is updated every time – i.e., the last quiz datetime, literally means "the last quiz datetime", regardless of whether or not a reconsolidation took place, regardless of how partial or full the reconsolidation is.

Actually, let's not call it "inherent interval", but rather, let's call it the "leftover interval" instead. And for the sake of analogy, think of the "leftover interval" as the "fridge", as some kind of container for "leftover food" (or leftover interval) that must be consumed, and only once all food has been consumed can we truly say that a full reconsolidation took place.

Suppose the user first encounters an item: we then have an Ebisu model, a datetime of last encounter (or last quiz datetime), and a leftover interval of zero (an empty fridge). Now, N days have passed since the last encounter, and if the user reviews at this point, we serve them a "fresh food" of "N days" along with any leftover food from the fridge. If a full reconsolidation must happen, those fresh "N days" food, along with any leftovers must all be consumed.

If no or zero reconsolidation took place, that fresh food (of "N days") will become a part of the current leftover food. If a partial reconsolidation happens instead, it could be that the fresh food is only partially consumed – it could also be that the fresh food is fully consumed, but it's still a partial reconsolidation if there's still leftover food not consumed.


As for the failed "active recall" review, I like to think of it as having two parts, or two items: one is an active recall item, and the "displayed answer after failing" as a passive review item. Also, those two items are correlated: the passive review has a chance of boosting the probability of recall a little. Now, asking the user whether or not the displayed answer feels old or new could be an option, but I'm leaning towards avoiding that route (if I can), for the sake of UX.

So, what about model correlations? I think it might be better if I make a separate issue about that, but as a summary:

Suppose we have items A and B. Successfully recalling item B might have a 20% chance of subconsciously triggering recall of item A as well. (You could say that's a "strengthening" correlation!) This could happen if a part of item B's content is (at least mentally) 20% of item A's content; therefore, recalling item B may recall item A or a subset of it. Now, that "20%" could be a parameter, initially provided by the user, but the algorithm might tune it as more data are gathered. (I'm not sure how that should actually be done though.)

It could also be that, successfully recalling item B might have a 20% chance of causing item A to fail when quizzed on afterwards. Also, the correlation may be one-way only (e.g., recalling item B affects item A, but not the other way around). Additionally, the reconsolidation caused by the correlation could be partial or none at all.

I do have something more to say about this "model correlation" feature, but I'll leave it at that for now, as this comment is getting big. (I'll make sure to instead make a separate issue or comment on an existing one about this.)


I know this has been a very long comment 😅 so my apologies for the long read. I was just hoping all these are interesting enough in guiding the design for the next version of Ebisu.

jasonsparc commented 3 years ago

Posted a comment on another issue regarding model correlations: https://github.com/fasiha/ebisu/issues/27#issuecomment-947903991