Closed ttencate closed 4 years ago
Tried this law to compute E[h]
but my integration skills are sorely lacking, and Wolfram Alpha just gives me timeouts. It's of the form integral e^(ax^2 + bx + c) dx
.
Nice work! Explicitly tracking the halflife is what Duolingo's halflife regression does! Read their paper here: https://github.com/duolingo/halflife-regression/blob/master/settles.acl16.pdf and they also have a blog post about it. It needs machine learning to update halflives (and you'll see in that repository's issues where I, along with others, have described serious concerns with their training algorithm to do that).
When I began work on Ebisu I wanted to see if we could do something like halflife regression purely Bayesianly (I talk a bit about this in my blog post about Ebisu—I mention considering the Gamma distribution for halflife), and it turned out that it was easy enough to just put a prior on recall at some time, and then deterministically put that through the exponential to get the prior at any given time.
Now I'm intrigued though: log(h)
when h
is normally distributed has a well-known distribution: https://en.wikipedia.org/wiki/Log-normal_distribution. I'd have to sit down with some pens and paper to see what the posterior (after a quiz) would look like if you model its Bernoulli odds as the output of exp(-tnow/h)
, that might turn out to be tractable—and even if not, we should make a note of it somewhere so others know we looked 😊.
A concern here is, would different variances on h
lead to different predictRecall? That is, if I was confident in h
for one quiz but tentative for another, would the uncertainty change my prediction of recall for the two quizzes? (It does in Ebisu, which is kind of nice—compare predictRecall((5,5, 1), 5.)
vs predictRecall((2,2, 1), 5.)
)
A side note though. I do somewhat disparage this in the Ebisu doc but if we store model halflife, then you could indeed replace predictRecall
with exp
as you suggested, that's likely a totally reasonable approximation for your true goal, which is to rank the quizzes. You could also do something related: calculate the output of predictRecall
over ten logarithmically-spaced times into the future (one hour, 6 hours, one day, 2 days, one week, two weeks, one month, etc.) and then use SQL to interpolate between them.
I'll look more later.
And here I am, discussing statistics while I could (and should) be working on my quiz app instead... but it's just way too dang interesting :smile:
Thank you for the (another) in-depth reply, and especially the link to your old blog post, which I hadn't seen before. So this ground has indeed been trodden before... although not with a log-normal distribution perhaps? I played around with a gamma distribution just now too; at first I thought we couldn't position and scale it the way we needed, but we can; the usual parameters are just a bit unintuitive to work with.
Also I didn't even know that a log-normal distribution was an official thing, with a Wikipedia page and all. The value of the mean, which I failed to derive, has kindly been provided already (and is simple to compute), and maybe some of the other equations are useful too. A log-normal would be nice because it doesn't need a complicated, expensive and numerically tricky gamma (or gammaln) function.
A concern here is, would different variances on h lead to different predictRecall? That is, if I was confident in h for one quiz but tentative for another, would the uncertainty change my prediction of recall for the two quizzes? (It does in Ebisu, which is kind of nice—compare
predictRecall((5,5, 1), 5.)
vspredictRecall((2,2, 1), 5.)
)
I can see how this arises, but it's an inconvenient property because it makes predictRecall
more expensive. My intuition says that there might be a probability distribution on the halflife that doesn't have this property, i.e. whose predictRecall
would depend only on mu
and not on sigma
. Of course now I'm hoping that it'll turn out to be the gamma or log-normal... but probably not. Maybe one could work backwards from this requirement and derive what such a distribution would have to look like?
A comment—Monte Carlo can help you quickly get a sense of how different priors on different variables behave. For example, here's a snippet to simulate a log-normal and shows you what happens for various parameters, tnow
, and quiz result:
import numpy as np
from scipy.stats import norm
def weighted(samples, likelihood=[1]):
sum = np.sum(likelihood)
mean = np.sum(likelihood * samples) / sum
var = np.sum(likelihood * (samples - mean)**2) / sum
return dict(mean=mean, std=np.sqrt(var))
def monteCarlo(priorHalflifeMean,
priorLogHalflifeStd,
tnow,
result,
size=1_000_000,
random_state=1):
# generate some Monte Carlo samples
logHalflife1 = norm.rvs(
np.log(priorHalflifeMean), priorLogHalflifeStd, size=size, random_state=random_state)
halflife1 = np.exp(logHalflife1) # also Monte Carlo
prior = 2**(-tnow / halflife1) # also Monte Carlo! Probability of recall at `tnow`
# This is the Bernoulli likelihood: a *probability*
likelihood = prior if result else (1 - prior)
print('### prior halflife mean={}, prior log-halflife std={}, tnow={}, result={}'.format(
priorHalflifeMean, priorLogHalflifeStd, tnow, result))
print('- probability of recall before quiz', np.mean(prior))
# print('- prior halflife', np.mean(halflife1))
# print('- posterior halflife moments', weighted(halflife1, likelihood))
print('- prior LOG-halflife', np.mean(logHalflife1))
print('- posterior LOG-halflife moments', weighted(logHalflife1, likelihood))
priorHalflifeMean = 1.0
priorLogHalflifeStd = 0.5
monteCarlo(priorHalflifeMean, priorLogHalflifeStd, 1, True)
monteCarlo(priorHalflifeMean, priorLogHalflifeStd, 10., True)
monteCarlo(priorHalflifeMean, priorLogHalflifeStd, 10., False)
monteCarlo(priorHalflifeMean, priorLogHalflifeStd * 3, 10., False)
This prints out:
Which shows us that the scheme has nice properties, it responds as expected to tnow
much higher (and presumably lower) than the expected halflife, that the uncertainty about the halflife changes things in the correct way, etc.
it's an inconvenient property because it makes
predictRecall
more expensive. My intuition says that there might be a probability distribution on the halflife that doesn't have this property, i.e. whosepredictRecall
would depend only onmu
and not onsigma
Hmm, if this is the goal, I'm not sure if a Bayesian approach would be fruitful. The expected recall probability E[f(h)] = integral[f(h) * p(h), all h]
, where f
is exponential and p
is the probability density, will under normal circumstances be a function of the width of the density p(h)
. Though I haven't proved it, it seems unlikely that you could find a general p(h)
that allows the expectation to be the same even as you widen or narrow the density.
You can use the above Monte Carlo snippet to check that indeed as you change the sigma of log(h)
, you get different recall probability.
However, what you mentioned in your earlier issue,
I was trying a simpler model: exponential decay where right answers multiply the halflife by a hardcoded factor >1, like 1.5, and wrong answers by a different factor <1, like 0.5. But I was unable to find a combination of factors that (subjectively) did a better job than Ebisu
it seems like this should work without bringing in Bayesian machinery. You could store (halflife, # of quizzes)
as your model; probability of recall is just 2**(-tnow / halflife)
; then the update could aggressively scale the halflife if # of quizzes
is low, and lightly if high. In this scheme, # of quizzes
takes the place that the prior would have had in updating.
If the goal is just to have blazing fast predictRecall
, then I've always thought that the way to do this would be to store the output of predictRecall
at ten or so future timestamps (either during update, or periodically), and then when you want to find the probability of recall now, just find the nearest timestamp to now and use that one's probability.
If you're concerned about accuracy,
predictRecall
on, andNevertheless, I do think it's worth digging into how to do updateRecall
after putting a prior on halflife!
Edit Oops, I stopped too soon in the derivation of the posterior when you model log-halflife as normal—here's a corrected version: it's still not clear how to analytically find the mean and variance of the posterior on log-halflife in this case, but maybe someone can point out a solution:
It's probably worth checking with a few other priors on the halflife or log-halflife, to see if the posterior's form is less fearsome.
Oops, my previous post was incomplete, I updated it to be more correct.
Thank you for looking into this! Pity it didn't work out as nicely as I hoped. Exponentials-in-exponentials... that looks daunting indeed, although I was never much good at integration.
The expected recall probability
E[f(h)] = integral[f(h) * p(h), all h]
, wheref
is exponential andp
is the probability density, will under normal circumstances be a function of the width of the densityp(h)
. Though I haven't proved it, it seems unlikely that you could find a generalp(h)
that allows the expectation to be the same even as you widen or narrow the density.
I'm not at all convinced that this is impossible. In fact I feel that the overall shape of p(h)
might be completely determined by this restriction. The degenerate case where the distribution has zero width already meets the requirement, so all we need to do is widen that to be not infinitisemally small :wink: If I have some time, I might take a shot at it.
Good news! I was able to simplify aggressively and found that with a Gamma prior on halflife directly, the posterior turns out to have tractable moments, and can be fit back to a Gamma! It uses the modified Bessel function of the 2nd kind, which thank goodness Scipy provides and will be fun to find JS/Java/Dart versions of 😇.
I'm putting the finishing touches on the negative boolean update and will post an update. I haven't looked at how noisy-binary and binomial quizzes will work with it, though I'm tentatively confident they should be tractable too.
There's an elegant way to reframe the posterior when you're working with halflife and log-halflife, which could help us evaluate other priors (I tried log-normal, logistic, chi-square, and Gamma distributions).
That's good news indeed!
... turns out to have tractable moments, and can be fit back to a Gamma!
Does that mean that the result is actually only an approximation of the posterior, and not the real thing? In fact I've been wondering the same about the original Ebisu algorithm. Not that it really matters, as the whole Ebbinghaus curve is already a very simplified approximation anyway :smile:
It uses the modified Bessel function of the 2nd kind, which thank goodness Scipy provides and will be fun to find JS/Java/Dart versions of :innocent:.
There are beta, gamma and Bessel of the first kind functions in Apache Commons, if that helps. In fact their gammaln was initially the basis for my port... but it was pretty complicated and didn't pass one of the unit tests. Whether that was an issue with that implementation, with yours, or with my port, I daren't say; I ended up just porting your much simpler version.
So what are your thoughts on this model compared to the original beta distribution?
Does that mean that the result is actually only an approximation of the posterior, and not the real thing? In fact I've been wondering the same about the original Ebisu algorithm
Correct, in general none of these these posteriors are conjugate, meaning they're some different distribution than the prior after the prior goes through the binary update. For both Beta-on-recall and Gamma-on-halflife, we have an analytical expression for the posterior, but in order to return a new model that's still Beta or Gamma, we have to moment-match, which is the one and only approximation.
(The one exception was, with Beta-on-recall, the time-traveled prior (a GB1 distribution) is conjugate with the binary update if the quiz was successful, but not otherwise. So not really super-useful.)
Edit corrected the code below after initially posting.
I think we may have to give up on the Gamma prior on halflife. While my attention was distracted making sure that updates after binary quizzes are doable, I failed to notice that predictRecall
is actually more complicated than the Beta-on-probability prior 😢!
To see this, consider—
import numpy as np
from scipy.stats import gamma as gammarv
from scipy.special import gamma
from scipy.special import kv
a = 2.
b = 1.5
t = (a / b) # mean halflife!
halflifes = gammarv.rvs(a, scale=1 / b, size=1_000_000)
prob = np.exp(-t / halflifes)
predictRecall = lambda a, b, t: 2 * (t * b)**(a / 2) * kv(a, 2 * np.sqrt(t * b)) / gamma(a)
predicted = predictRecall(a, b, t)
print('predictRecall @ time={}: analytical={}, Monte Carlo={}'.format(t, predicted, np.mean(prob)))
# predictRecall @ time=1.3333333333333333: analytical=0.3092345700088991, Monte Carlo=0.30936420184691676
(I use exp(-time / halflife)
instead of 2**(-time / halflife)
because the math was more straightforward. So although I refer to it as "halflife" in the code snippet, it's actually just a "time constant". But this doesn't change anything. Also, I use the alpha/beta parameterization of the Gamma distribution.)
Anyway, the point is, it's not analytically straightforward to calculate what the predicted recall is for a given t
, nor which t
gives rise to a predicted recall of, say, 0.5. t
equal to the mean a/b
or the mode (a-1)/b
don't result in any special predicted recall. This is a case of Jensen's inequality, which I allude to in the README—basically, E[f(x)] ≥ f(E[x])
when f
is convex, with equality only when f
is linear. In our case, x
is the halflife and f(x) = exp(-t / x)
, which is convex, so predictRecall
, i.e., E[f(x)]
, which we want, will be greater than the straightforward application of the Ebbinghaus exponential decay function on the mean halflife. Which is bad because usually we don't know how big this difference is.
Actually, it's helpful to see this visually:
ts = np.linspace(0, 10)
import matplotlib.pylab as plt
plt.ion()
plt.figure()
plt.plot(ts, np.exp(-ts / (a / b)), label='approx', linewidth=4)
plt.plot(ts, np.vectorize(lambda t: predictRecall(a, b, t))(ts), label='pred1', linewidth=3)
plt.plot(ts, np.vectorize(lambda t: predictRecall(a * 3, b * 3, t))(ts), label='pred2', linewidth=2)
plt.plot(ts, np.vectorize(lambda t: predictRecall(a / 3, b / 3, t))(ts), label='pred3', linewidth=1)
plt.legend()
plt.xlabel('time')
plt.ylabel('recall probability')
plt.savefig('gamma.png', dpi=300)
produces the following:
and while this disproves what I said about Jensen's inequality and the convexity of exp()
, I think it's helpful to see the range of values that predictRecall
can output for Gamma distributions with the same mean (all three "pred1"–"pred3" have the same mean).
So since the Gamma-on-halflife doesn't really buy us anything over the Beta-on-probability (other than possibly simplifying the update step by getting rid of rebalancing), and might make it more complicated to do noisy-binary and binomial quizzes, I'm going to try and focus on quantifying the risk of just replacing predictRecall
in the current Ebisu case with the approximation 2**(-t / halflife)
or equivalently t / halflife
(since 2**(x)
is monotonic in x
, so both will sort the same set of models the same way). halflife
would be computed in the previous update, per #31.
If we can show that that approximation is reasonable, then quiz apps can benefit from having a super-fast alternative to predictRecall
.
What do you think?
I use
exp(-time / halflife)
instead of2**(-time / halflife)
because the math was more straightforward.
Good call. Maybe we should call it "elflife"? It's fitting, because elves do have a longer lifespan than halflings ;)
Thinking about Jensen's inequality: this seems to dash my hopes of finding any distribution d(h, alpha)
on the elflife, with parameters h
(elflife) and alpha
(uncertainty), satisfying E(d(h, alpha)) = h
, and such that predictRecall
is a simple exponential, in other words
E[exp(-t / d(h, alpha)] = exp(-t / h)
because this is essentially (setting f(h) = exp(-t / h)
):
E[f(d(h, alpha))] = f(h)
but Jensen gives, for nonlinear f
,
E[f(d(h, alpha))] > f(E[d(h, alpha)]) = f(h)
meaning it can never be equal. That seems to be pretty much what you said above as well.
I also thought, at first glance, that your plot does seem to contradict that conclusion: approx
which represents f(E[x])
goes both above and below the "real" value pred1
which represents E[f(x)]
. But that's not what Jensen's equation says: it's only about the expected value of these distributions, not about the distributions themselves. And the expected value of approx
is clearly lower (more to the left) than that of pred1
.
But maybe we can quantify the effect that changes to alpha / beta
have on the shape of this curve? I was hoping it might turn out to be something simple, but a first experiment doesn't seem encouraging. Building upon your code, I plotted the quotient of approxRecall
over predictRecall
, the latter being our simple exponential:
def approxRecall(a, b, t):
h = a / b
return np.exp(-t / h)
ts = np.linspace(0, 5, 100)
plt.ion()
plt.figure(figsize=(20, 10))
for i in (0.1, 0.3, 1, 3, 10, 30):
#plt.plot(ts, np.vectorize(lambda t: predictRecall(a * i, b * i, t))(ts), label='pred%.1f' % i, linewidth=4)
#plt.plot(ts, np.vectorize(lambda t: approxRecall(a * i, b * i, t))(ts), label='approx%.1f' % i, linewidth=2)
plt.plot(ts, np.vectorize(lambda t: approxRecall(a * i, b * i, t) / predictRecall(a * i, b * i, t))(ts), label='error%.1f' % i, linewidth=2)
plt.legend()
plt.xlabel('time')
plt.ylabel('error in approx recall probability')
plt.savefig('error.png', dpi=96)
Those don't look like any basic function that I know of. I thought that they would at least agree at t = e
but apparently not! Or would that be a numerical issue?
Edit: inverted the graph because it makes more sense this way.
I also thought, at first glance, that your plot does seem to contradict that conclusion:
approx
which representsf(E[x])
goes both above and below the "real" valuepred1
which representsE[f(x)]
. But that's not what Jensen's equation says: it's only about the expected value of these distributions, not about the distributions themselves. And the expected value ofapprox
is clearly lower (more to the left) than that ofpred1
.
I'm not following—as you say, the biggest blue "approx" curve represents f(E[x])
, i.e., exp(-t / E[x])
whereas the other curves represent E[f(x_i)] = E[exp(-t / x_i)]
for various x_i
. The latter seem to be less than the former on the left part of the plot but greater than the former on the right part of the plot, which seems like it contradicts Jensen, since exp
is strictly convex right?
We can show the same thing with the Monte Carlo samples—after running my first code snippet, I can do the following in IPython:
In [7]: jensen = lambda t: [np.mean(np.exp(-t / halflifes)), np.exp(-t / np.mean(halflifes))]
In [8]: jensen(0.5)
Out[8]: [0.5855561840096577, 0.6874954822547535]
In [9]: jensen(10)
Out[9]: [0.007355087627151486, 0.0005564126211527937]
which show [ E[f(x)], f(E[x]) ]
for two t
s. I'm confused but am sure it's something very stupid I'm doing…
Those don't look like any basic function that I know of.
We know analytically what those are though, because we know analytically what E[f(x)]
truly is (the scary predictRecall
expression with kv
Bessel function) 😆.
I'll write up the math for predictRecall
and updateRecall
for Gamma halflife (or time constant), it's not too hard since the heavy lifting is done via Wolfram Alpha symbolic integration—it could help someone investigate other possible choices for halflife (Rice distribution? Weibull?).
But as I mentioned earlier, since we don't have a predictRecall
that's simpler than the Beta-on-recall case, I think we're going to be better-served by just evaluating #31, and people who want fast predictRecall
over lots of models should use the approximation -tnow / halflife
, where tnow = (now - lastReviewed)
and halflife
comes from rebalancing the model.
Errata in the below, the following corrections have to be made (I don't want to edit the LaTeX and upload a new image, forgive me)
n
in h^n
below, it should be h^N
.(t * b)^(alpha/2)
, whereas in the posterior moments, we need to split up t
and b
. If there's confusion about this step, let me know and I'll update the details.Original:
Here's a quick writeup of the math for Gamma-on-time-constant:
I implemented this and verified it via numerical integration in https://github.com/fasiha/ebisu/blob/889588a5774a24a0820b46517c22ea2fbfa1ca07/gam.py.
Nice work! But I agree that it seems to have no advantage over the beta-on-recall model with rebalancing. Still, it was interesting and educational. Thank you for investigating!
hi, thank you very much for the amazing library. @fasiha
If the goal is just to have blazing fast
predictRecall
, then I've always thought that the way to do this would be to store the output ofpredictRecall
at ten or so future timestamps (either during update, or periodically), and then when you want to find the probability of recall now, just find the nearest timestamp to now and use that one's probability.If you're concerned about accuracy,
- you can use this simple scheme to reduce the number of quizzes to run a full
predictRecall
on, and- furthermore, I usually don't quiz on the item with the absolute lowest recall, because it kind of leads to predictability—if you review once a day, say, sometimes you can tell what the first quiz is going to be (because it was the most recent one learned), and that's annoying. I often either (a) find the 5% of quizzes with lowest recall probability and pick one at random to quiz or (2) find all quizzes below a threshold (.01%, 0.1%, 1%, etc.) and pick one at random.
Nevertheless, I do think it's worth digging into how to do
updateRecall
after putting a prior on halflife!
Could you please explain a little bit more on the technique of calculating 10 predict recalls at different time intervals (are they fixed timestamps for all terms like 1/1/21, 2/1/21 etc or sort of review_time + delta(t=1 day, 1 week etc)? ) and then when we need to select the terms to study, how should we. go about it?
From my understanding of your reply, for each term after a review, we will update the term model with predictRecall of 10 different time intervals (review_time + delta(t)) at T0 we want to find which terms to review, we would look sort terms by (predict_recall_timestamp, probability) say lower 10% and pick a random term to review?
Thanks
@jwtnb thanks for the questions!
First thing I should say is, as I mention in #53, I'm working on a rewrite of Ebisu and it's going to embrace @ttencate's idea in this issue: the Bayesian prior is on halflife, and even though it's an approximation, predictRecall
is going to be a very simple expression: -(now - lastSeen) / halflife
(that's the log-probability-of-recall; take the exp()
of that to get the actual recall probability). You can do this in SQL 😄 !
The really simple scheme I proposed,
store the output of
predictRecall
at ten or so future timestamps (either during update, or periodically), and then when you want to find the probability of recall now, just find the nearest timestamp to now and use that one's probability
works just like you describe. The idea is, you run predictRecall
or some similar exponentially-increasing list of times and store that in a database. Then at some point in the future, you look for the two (maybe one) time slots that bracket the current time, and do a linear interpolation. … That is probably very confusing, sorry, hopefully this code snippet is more clear 😅:
import ebisu
# Courtesy of https://stackoverflow.com/a/62314804
def lerp(x1: float, x2: float, y1: float, y2: float, x: float):
"""Perform linear interpolation for x between (x1,y1) and (x2,y2) """
return ((y2 - y1) * x + x2 * y1 - x1 * y2) / (x2 - x1)
# at quiz time, put these in the database
halflife = 1.0
model = (3., 3., halflife)
# equivalent to 4**numpy.arange(-1, 3.6, 0.5)
futures = [0] + [4**(-1 + i * 0.5) for i in range(10)]
pRecalls = [ebisu.predictRecall(model, future) for future in futures]
# now let's figure out the lerp step using these lists
def predictRecallCached(model, tnow: float, futures: list[float], pRecalls: list[float]) -> float:
"""Try to linearly interpolate `predictRecall`"""
rightidx = next(filter(lambda idx: futures[idx] > tnow, range(len(futures))), None)
if rightidx is not None:
# tnow <= futures[-1], we can do linear interpolation!
leftidx = rightidx - 1
# We know leftidx >= 0 because `futures[0]==0`!
return lerp(futures[leftidx], futures[rightidx], pRecalls[leftidx], pRecalls[rightidx], tnow)
else:
# tnow > futures[-1]
return ebisu.predictRecall(model, tnow)
# some time later
tnow = 50 # time since last quiz
pRecall = predictRecallCached(model, tnow, futures, pRecalls)
print(f'pRecall = {pRecall:0.2g}')
Hopefully that makes sense! You can tweak how many samples of predictRecall
you want to cache, maybe you don't even bother with lerp
, you just pick the closest one, etc.
(Ebisu v3 that I mentioned above is going to remove all this headache because everything will be in terms of halflife.)
Regarding the other tweak I mentioned, you asked:
we would look sort terms by (predict_recall_timestamp, probability) say lower 10% and pick a random term to review
this is an app-level decision beyond Ebisu but yes, I think it's better to (slightly) randomize quizzes with low probability of recall. So you can
But again, these issues have more to do with your quiz app, Ebisu doesn't care 😄.
First off, thank you for reading my ramblings thus far! I'm learning something new every day.
This isn't really an issue with Ebisu, more of a discussion topic, but I'm opening it as an issue so it can be preserved for posterity and easily linked to if this topic comes up again.
My thought was: instead of assuming a beta distribution at a particular time t, could we assume a distribution on the halflife itself? I haven't yet tried to do the math, because maybe you've tried this already and it leads nowhere, so it seemed best to discuss it first.
The advantage would be that
predictRecall
could simply take the expected value of the halflife distribution and boil down to a very cheapreturn 0.5**(tnow / h)
, which can even be done in SQL if needed.What would such a distribution look like? We need something whose "mass peak" can be shifted around arbitrarily to represent the expected halflife, and whose "width" can be tuned to represent the certainty. We can do both of these things with a normal (Gaussian) distribution, as in
h ~ N(mu, sigma)
, but the problem is that it has a nonzero tail to the left of the y axis. Really our distribution should only exist fort >= 0
and be zero att = 0
.So how about
log(h) ~ N(log(mu), sigma)
? Technically it's undefined forh = 0
but it converges to 0 when approached fromh > 0
, so I can live with that. It seems to have the desired properties:This is how far I got. Here are the things I'm concerned about, which an experienced statistician could probably answer easily:
h
instead oflog(h)
?log(h)
is obviouslylog(mu)
, but the expected value ofh
is probably notmu
, as the tail is skewed to the right. So maybepredictRecall
still needs to do some more work to compute the actual0.5**(tnow / h)
from the parametersmu, sigma
. Or maybe we can re-parameterize the distribution so that one of the parameters is equal to the expected value.updateRecall
given some observation (quiz result) using Bayes' rule. For all I know, the reason people use all these well-known distributions is that they're the only ones that can be dealt with analytically.