ankitects / anki

Anki's shared backend and web components, and the Qt frontend
https://apps.ankiweb.net
Other
19.08k stars 2.16k forks source link

Keep previous FSRS parameters if they get worse when optimizing #2972

Closed dae closed 9 months ago

dae commented 10 months ago

https://forums.ankiweb.net/t/measures-to-prevent-new-fsrs-parameters-from-being-worse-than-the-old-ones/38957

Expertium commented 10 months ago

Previously, you said, "Before we can implement this, we’ll need to know how we should handle RMSE/log-loss moving in different directions." I think the right thing is "Only look at RMSE." The problem with log-loss is that it depends on retention. RMSE (bins), on the other hand, does not depend on retention.

dae commented 10 months ago

Possible approach we could use:

Expertium commented 10 months ago
  • and return an error if they're worse

That would just confuse the user. I think instead it's better to just automatically use "Evaluate" with both old and new parameters, and then keep whichever are better. All of that should happen without notifying the user. Even if we notified the user, it wouldn't really help them to make any kind of good decision, it would just make them wonder "Huh? Is the optimizer broken? Is there a problem with my settings?". @L-M-Sherlock what do you think?

dae commented 10 months ago

Sorry, I should have been more explicit. I was expecting the frontend to catch the returned error, and skip updating the existing weights in that case.

Expertium commented 10 months ago

By the way, according to this comment of LMSherlock, it doesn't matter whether the default parameters or the previous ones are used when initializing the optimizer.

L-M-Sherlock commented 10 months ago

The problem with log-loss is that it depends on retention

It doesn't matter in this case, because we evaluate the parameters in the same dataset.

user1823 commented 10 months ago

@L-M-Sherlock, do you think that we should only look at the RMSE (bins)? If so, why?

I want to add that I re-optimized my parameters today and got the following result:

In this case, the relative change in RMSE is larger. So, I think that the parameters are worse overall.

But, if the relative change in log loss is larger in magnitude, should we still consider only RMSE to decide whether to update the parameters or not?

L-M-Sherlock commented 10 months ago

According to my previous experiment, the RMSE(bins) can be decreased even when the parameters are changed very slightly: https://github.com/open-spaced-repetition/spaced-repetition-algorithm-metric/blob/main/ICILoss.ipynb

Expertium commented 10 months ago

It doesn't matter in this case, because we evaluate the parameters in the same dataset.

It does. If the user's retention consistently goes down, log-loss will keep getting worse after each optimization, bur RMSE (bins) won't necessarily keep getting worse, or at least not as much.

L-M-Sherlock commented 10 months ago

It does. If the user's retention consistently goes down, log-loss will keep getting worse after each optimization, bur RMSE (bins) won't necessarily keep getting worse, or at least not as much.

The log-loss of old parameters is also calculated in the current dataset whatever its retention goes up or down.

Expertium commented 10 months ago

I don't understand what you're trying to say.

user1823 commented 10 months ago

No reviews are done between the two evaluations. So, the retention must be the same.

Expertium commented 10 months ago

Ah. Yeah, that's true. I was talking about a more complicated scenario, where the parameters are re-optimized many times and each time the new log-loss is greater. But it's possible that the new RMSE wouldn't be greater each time.

user1823 commented 10 months ago

Well, I didn't get what you are saying. But, you don't need to explain. (Chances are high that I still won't be able to draw any conclusions from that)

Can there be any case where the log loss decreases and the RMSE increases but the parameters are more optimal? If not, we can simply use RMSE as the only criteria to decide whether the new parameters are better or worse.

L-M-Sherlock commented 10 months ago

Can there be any case where the log loss decreases and the RMSE increases but the parameters are more optimal?

Theoretically, there is a case where the log loss increases and the RMSE decreases and the parameters are worse: https://github.com/open-spaced-repetition/spaced-repetition-algorithm-metric/blob/main/metrics_research.ipynb

Edit: The image in the bottom.

image

It's RMSE(bins) is zero. But its log loss is worse.

By the way, I find a practical one (but I don't know which parameters are better):

image image

Source: https://github.com/open-spaced-repetition/spaced-repetition-algorithm-metric/blob/main/ICILoss.ipynb

The first one's log loss is worse than the second one. But its rmse is better than the second one.

user1823 commented 10 months ago

Can you point out which one? Just edit your previous comment to include the example instead of creating a new comment. I don't want to pollute the comments list.

user1823 commented 10 months ago

The case of an algorithm giving the same prediction for all the reviews doesn't seem to be practical. Even if there was such a case where a very large number of reviews were put in the same bin, it seems to be difficult to detect.

Of course, we can change the criteria later if we have more insights. But for now, I think that only using RMSE(bins) is the way to go.

user1823 commented 10 months ago

All of that should happen without notifying the user.

I think that we can show a message similar to "Your parameters are already optimal." This way,

Expertium commented 10 months ago

Well, I didn't get what you are saying. But, you don't need to explain. (Chances are high that I still won't be able to draw any conclusions from that)

I meant a situation like this. As n reviews increases: 1) Retention decreases 2) Log-loss increases 3) RMSE (bins) stays the same

image Here's a toy example. The numbers aren't shown because the exact values don't matter, what matter is the slope of the curve. Retention goes down, log-loss goes up, RMSE (bins) stays the same. This seems like a plausible scenario to me, which is why I recommend focusing only on RMSE (bins).

user1823 commented 10 months ago

As n reviews increases:

But in the current problem, n_reviews remains the same but the parameters change. (The two evaluations run on exactly the same data, just with different parameters)

So, what you have described doesn't seem to be applicable in the current situation.

Expertium commented 10 months ago

I think that we can show a message similar to "Your parameters are already optimal."

I like that idea. So @L-M-Sherlock, me and user1823 agree that only RMSE (and not log-loss) should be used to decide which parameters to keep, so if you don't have any objections, you should probably start working on implementing it. Also, about this: image In the latter case, not only RMSE but every other metric is also worse. R-squared is worse, MAE is worse, all ICI-related metrics are worse, and the intercept and slope are further from 0 and 1 (which corresponds to a perfect algorithm). So I would say that the latter parameters are worse.

L-M-Sherlock commented 10 months ago

Fine. I don't have any objections.

user1823 commented 10 months ago

SuperMemo says that the universal metric is the best way to compare two spaced-repetition algorithms.

@L-M-Sherlock, can the UM be calculated as quickly as calculating RMSE twice? If so, we can use UM to compare the old and the new weights.

If not, RMSE(bins) would be fine.

L-M-Sherlock commented 10 months ago

can the UM be calculated as quickly as calculating RMSE twice

Of course. I have implemented it in the Python Optimizer. I plan to port it into fsrs-rs.

dae commented 9 months ago

When implementing this, we'll want to update to fsrs 0.3: https://github.com/open-spaced-repetition/fsrs-rs/pull/154

Xemorr commented 9 months ago

I'm not sure why you'd want to keep your previous parameters?

Consider the scenario where you've done 1 review and you generate your parameters. The parameters fit your 1 review perfectly, so RMSE of 0 You do 10000 more reviews, and then regenerate the parameters. The new parameters have worse RMSE than the RMSE of the previous parameters when they were generated. It keeps the parameters generated after 1 review.

or does the optimizer not always generate the parameters with the lowest loss for the dataset. I imagine there's some confusion somewhere

Expertium commented 9 months ago

The new parameters have worse RMSE than the RMSE of the previous parameters when they were generated.

No, both sets of parameters are evaluated on the full dataset. In your example, both sets of parameters would be evaluated on the 10000 reviews dataset. We do not compare RMSE(old parameters, 1 review) to RMSE(new parameters, 10000 reviews). We compare RMSE(old parameters, 10000 reviews) to RMSE(new parameters, 10000 reviews).

Xemorr commented 9 months ago

The new parameters have worse RMSE than the RMSE of the previous parameters when they were generated.

No, both sets of parameters are evaluated on the full dataset. In your example, both sets of parameters would be evaluated on the 10000 reviews dataset. We do not compare RMSE(old parameters, 1 review) to RMSE(new parameters, 10000 reviews). We compare RMSE(old parameters, 10000 reviews) to RMSE(new parameters, 10000 reviews).

Thank you for the clarification, I was having a conversation with that prolific r/anki redditor and he didn't have a proper answer for me. So this means it's the case where the optimiser does not always give the optimal parameters for the training data. Why is that (is there a training/validation split)?

Expertium commented 9 months ago

I was having a conversation with that prolific r/anki redditor

u/ClarityInMadness? That's me. Sorry if I missed your comment.

So this means it's the case where the optimiser does not always give the optimal parameters for the training data. Why is that?

Honestly, ask @L-M-Sherlock. I don't fully understand why myself.

Xemorr commented 9 months ago

I was having a conversation with that prolific r/anki redditor

u/ClarityInMadness? That's me. Sorry if I missed your comment.

So this means it's the case where the optimiser does not always give the optimal parameters for the training data. Why is that?

Honestly, ask @L-M-Sherlock. I don't fully understand why myself.

oh this was a while ago, I don't think you knew why at the time. I've just been wondering why we were doing this keeping of previous FSRS parameters since

L-M-Sherlock commented 9 months ago

So this means it's the case where the optimiser does not always give the optimal parameters for the training data. Why is that

Gradient descent-based methods are not guaranteed to find the global optimal parameters.

user1823 commented 9 months ago

@L-M-Sherlock, I agree that this hack was needed to deal with the issue that you just mentioned.

However, there is a high probability that the latest version has another issue that makes it more likely to get worse parameters after optimization. For instance, I have never been able to get better parameters than those I got in the end of December (when 23.12.1 was released).

In my opinion, the current issue is not the right place to discuss this further. So, if you want more details and/or want to discuss something, I recommend creating a new issue in the FSRS repo.

Xemorr commented 9 months ago

So this means it's the case where the optimiser does not always give the optimal parameters for the training data. Why is that

Gradient descent-based methods are not guaranteed to find the global optimal parameters.

Thanks Jarrett, I wasn't sure what kind of optimisation was being done. That makes sense