fasiha / ebisu

Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!)
https://fasiha.github.io/ebisu
The Unlicense
314 stars 32 forks source link

How to select the initial parameters? #40

Closed ernesto-butto closed 3 years ago

ernesto-butto commented 3 years ago

Hello,

Thank you for writing ebisu, I find the fact that it can be used to prioritize studying and not need to track the history of reviews of the student incredible useful and magical 🙂

Also its amazing how it adapts to students performance, and we only need to store so few parameters.

I'm working on a language learning app that allows users to import content they care about and create flashcards as they read, which they can later review in a separate area of the app to build their vocabulary.

I included the python ebisu, tested it a bit and decided we want to use it to help users optimize their time 🎉

However I need some help because even though I have experience coding, I also have a weak statistics background.

My first question would be, how should I go about deciding on a good alpha, beta, and initial time?

Right now I'm using:

(4., 4., 24. * 3600.) # alpha, beta, 24 hours in seconds

But I am wondering if this should be a better idea for this use case?

(2., 2., 48. * 3600.) # alpha, beta, 48 hours in seconds

2021-01-04_06h51_53

Since these are single words that users are learning, most of the time the back of the flashcard will be another word, but some times a longer mnemonic note such as e.g. Front: "alarm", Back: "he sounded the alarm"

This one is a test for the word dispense: 2021-01-04_06h54_56

Based on this information, what advice would you give me to help set the parameters?

I realize you worked an extensive documentation of how it works, and I'm sorry if my weak statistics background prevents me to finding the answer myself.

I do have some other questions about ways of using the model that can be useful, as well as some enhancement documentation proposals for ebisu users like myself, however I'm not sure if this is the place to ask.

Thank you again for writing ebisu!

fasiha commented 3 years ago

Short answer to your question:

  1. Pick an initial halflife that you think the memory of a newly-learned fact will decay to 50%—his totally depends on your facts and your users but 1–2 days seems totally reasonable.
  2. Set initial model to (2, 2, <that initial halflife you picked in step 1>). You're done 🥳
  3. (If you are so inclined, play with the Ebisu simulator that another contributor created to see if you want to tweak initial a=b=2. I like 2 though!)

Does this help?

I do have some other questions about ways of using the model that can be useful, as well as some enhancement documentation proposals for ebisu users like myself, however I'm not sure if this is the place to ask.

Yes, please ask here!

And thank you for your kind words! We are working on a new version of Ebisu that we hope will more accurately model how we learn and forget, so please watch for that in the coming months!

ernesto-butto commented 3 years ago

1 and 2

Hello! yes your answer helps! I'll launch a beta with 2 ,2 , and 24 hours and see users feedback.

Ok, another question in the following screen shot, I

2020-12-30_06h00_03

The recall probability is what I expected it to be, however I expected modelToPercentileDecay to be closer to 0h, since all 3 words have around 50% recall, and that supposed to happen about the first 24h... Im I missing something related to modelToPercentileDecay?

Here is the relevant code to those results:

def get_recall(model):
    """

    :param model:
    :return:
    """
    ebisu_model = (model['modelAlpha'], model['modelBeta'], model['modelTimeNow'])  # alpha, beta, and half-life
    recall = ebisu.predictRecall(ebisu_model,
                                 model['timeElapsed'],
                                 exact=True)

    mean_half_life = ebisu.modelToPercentileDecay(ebisu_model)

    new_model_response = {'ebisuRecall': recall, 'meanHalfLife': mean_half_life}

    return new_model_response

I think I'm using the method wrong, but I'm not really sure how to use it then... and I see that in the simulator, the method receives 2 arguments, not one like i'm using.

let timeToQuiz = ebisu.modelToPercentileDecay(model, pQuiz *  1e-2);

I was thinking that maybe the user would benefit of knowing when his words are in danger of being forgotten, and the app could even propose an optimal weekly schedule to study based on this indicator.

Thank you very much for your time and for making this library

fasiha commented 3 years ago

The recall probability is what I expected it to be, however I expected modelToPercentileDecay to be closer to 0h, since all 3 words have around 50% recall, and that supposed to happen about the first 24h... Im I missing something related to modelToPercentileDecay?

Note that the only argument passed to modelToPercentileDecay is the model: this function doesn't know how much time has elapsed. It just converts a model to halflife (in this case, 24 and 19 hours).

(As you found, you can provide an optional argument to modelToPercentileDecay to change the percentile from 50% (halflife) to any other probability of recall, like 90% or 10%. But again, the output of that function is just the time that the model predicts it takes for memory to decay to that level. It doesn't know how much time has actually elapsed.)

Hope that makes sense? It might not be necessary to show the halflife in the app, since the user already sees the recall prediction?

I was thinking that maybe the user would benefit of knowing when his words are in danger of being forgotten, and the app could even propose an optimal weekly schedule to study based on this indicator.

Oh interesting, yes, you can use modelToPercentileDecay to get the halflife (or "quarterlife" if you use 25-percentile recall, which will be longer than the halflife) of a fact, then say that that to schedule the next review: review time = previous study time + modelToPercentileDecay(model).

--

I've never done this because my apps tend to be much less classroom-like: my schedule is very hectic so sometimes I don't have much time to study, sometimes I have lots of time, so I just like to review the flashcards that are most likely to be forgotten.

I personally don't show the probability of recall—as you can see from other issues like #35, the probability of recall is usually very, very conservative, i.e., even if Ebisu predicts I have 1% recall, I usually get it right. This happens because

  1. the initial model isn't exact (some flashcards are easier than the other), but also because
  2. Ebisu's current model assumes a fixed memory halflife of the card, and the quizzes are used to help you estimate that. In statistics terms, Ebisu simulates a weighted coin and estimates its probability of coming up "heads" based on a few flips—it doesn't yet handle the case where the weighting of the coin changes because of the flips. Of course this is a major limitation in how it models memory, and we're working on making it dynamic, but I still find that Ebisu nicely produces a ranking between cards: if predictRecall(a) < predictRecall(b), then a is usually harder for me to remember than b.

I say all this just as extra background, hopefully it helped and didn't confuse! Feel free to ask as many questions as needed!

ernesto-butto commented 3 years ago

Thanks for the extra background ☺️, the coin analogy helped me understand better.

Questions:

  1. I am curious about how many cards do you usually deal with when using ebisu for yourself?

the initial model isn't exact (some flashcards are easier than the other)

  1. I thought that maybe it would be viable to adjust the starting point based on the difficulty or a word, or text, calculating the number of syllables, difficulty of reading or some similar approach... what do you think? or is it too much? Is there an approach that could be use? or it doesn't really matters given the models nature?
fasiha commented 3 years ago

I was thinking that for this app, I will change the % of recall to a very general visual indicator, say a power bar of 4 levels

I love this! As mentioned before, we've learned that while the recall predictions by Ebisu tend to be very conservative so using four bars sounds great. Maybe I can suggest using a logarithmic scale converting predicted recall to a bar:

[.5 ** (i+1) for i in range(4)]
# [0.5, 0.25, 0.125, 0.0625]

I.e., if predictRecall (with the exact flag true) is

And note, there's nothing special about 0.5 here: 0.333 or 0.25 should be good too.

I am curious about how many cards do you usually deal with when using ebisu for yourself?

This might be like the case of the shoemaker's children going barefoot… Probably the biggest deck I've used Ebisu with had maybe ~hundred cards (for when I was studying place-names before a trip to Japan)? Hopefully this admission isn't seen as me not trusting Ebisu (I think we understand its limitations) because I don't use any other flashcard software either—no Anki, no Memrise, etc.—mainly because I'm very picky about flashcards and I really love learning organically (reading books and looking up unfamiliar words in dictionary) so I don't feel it a big loss to not do flashcards much.

(Side note, I have been working on a big app that parses Japanese text with advanced NLP tools, and it does create a database of dictionary entries that could go right into SRS. I think I am close to finally making a flashcard app I like (inside the bigger app), so maybe in six months I'll be able to say that my biggest deck has two thousand entries or something 😅)

I thought that maybe it would be viable to adjust the starting point based on the difficulty or a word, or text, calculating the number of syllables, difficulty of reading or some similar approach

I like this too!

In my apps, I let the user "rescale" the difficulty of a card after they review it: they can say "this card was way too easy, scale its halflife by 2x" or "this card was really hard, you waited too long! Scale its halflife by 0.2!", because in reality some cards are easier than others. (I have added a function to the Ebisu API in a branch that I need to release to make this rescaling super-easy, let me know if you're interested and I'll prioritize releasing it.)

But I really like the idea of using outside data to initialize the halflife correctly. I like your idea of making initial halflife inversely-proportional to the difficulty, whether that's by looking at the length of the word or looking at an outside database of word frequency or cognates or whatever. Linguists who study second-language acquisition see a well-ordered progression of grammar mastery: native-English speakers learning Spanish master this first, then this, and this comes last—you might also use this.

And the dream of course is to use the data from many users to infer flashcard difficulty. Duolingo does some of this with machine learning.

I like all these ideas of using outside data to personalize each flaschard for each user. But… It's not clear to me that these tricks will help 😇—I can totally imagine in five years some scholars publishing a paper showing that this kind of personalization doesn't improve course outcomes in any statistically-significant way.

Ideally though, if we had an SRS that quickly and accurately estimates the forgetting rate of a fact, and we combine that with how frequently the user will encounter that fact outside of SRS, then I think that will maximize memory.

So there's a lot of room to experiment in this space! I am hopeful you can do some studies and blog about them 😄.

ernesto-butto commented 3 years ago

Great feedback!

  1. I think I will implement the logarithmic approach with 0.5 as the first threshold and get some beta users' feedback before adjusting. It would be an honor if you are among the testers 🧐.

  2. I think that 100 cards are a great starting point 😀, I'm excited to try Ebisu for myself with Italian and empowering users in their language journey. I will certainly blog results and insights.

  3. Your insights about using outside data were very interesting, you gave me an idea on how to bring back a game module (fill the blanks) we had discarded 💡 (I would love to elaborate, but I think it's a bit too much for this thread.)

I think that the way you rescale each card is the most practical and realistic way to adjust them, and it's valuable to know that you adjust the time to halflife for this.

I would love to integrate and test this!

ernesto-butto commented 3 years ago

For me, this issue was resolved with your 2nd response:

Short answer to your question:

  1. Pick an initial halflife that you think the memory of a newly-learned fact will decay to 50%—his totally depends on your facts and your users but 1–2 days seems totally reasonable.
  2. Set initial model to (2, 2, <that initial halflife you picked in step 1>). You're done 🥳
  3. (If you are so inclined, play with the Ebisu simulator that another contributor created to see if you want to tweak initial a=b=2. I like 2 though!)

Additionally, a way to "rescale" cards looks promising.

Note: I think I wrote too much about my app in the previous response, so I deleted all the irrelevant comments, I hope this thread is more useful to others this way.

fasiha commented 3 years ago

Ah, no worries, I was happy to read about your app, those details helped me confirm that I was answering the right questions 😄 please don't hesitate to ask follow-up questions!

ernesto-butto commented 3 years ago

Hello @fasiha !

I have been testing the implementation, and I have a question... is there a moment when ebisu prob of recall almost doesn't decrease in time? e.g. When a word is finally committed to memory?

fasiha commented 3 years ago

I have been testing the implementation, and I have a question... is there a moment when ebisu prob of recall almost doesn't decrease in time? e.g. When a word is finally committed to memory?

The model doesn't have anything like this built-in. Memory is always expected to decay, if only very, very slowly (a halflife of a year? Ten years?).

I know some quiz apps (like Wanikani) "burn" cards like this, meaning they don't review them any more, but others like Anki/Supermemo are happy to keep long-lived cards for review every few years. Hope this helps!

ernesto-butto commented 3 years ago

Thank you, your answer helps a lot! I'll try a hybrid of "burn" the card but testing the user on "burned" cards occasionally.