Closed dreeves closed 2 years ago
What happens if we constrain the degree of the polynomial based on the length of the x axis? i.e. how much time is in the viewport?
By default, that's the whole history, of course. A new goal could be a straight swathe (degree 1), once you've got more than N datapoints you get a quadratic, then add a degree every quarter or year?
@pjjh, smart, I think that would improve some cases but may be fussy to get right in general. Like it might be hard to get a Pareto improvement from that trick? I'm personally more excited to get it Really Right with one of the ideas in the above list but it obviously hasn't been the highest priority for me so anyone should feel free to take a crack at it in the meantime!
I don't know the right stats terminology, but what about "cross validation", so you divide the data in 4 buckets, try to fit polynomials of degree 1...N with 75% of your dataset and evaluate the error on the other 25%, then repeat this using in turn each of the buckets for validation, and you use the polynomial of degree K which has the lowest sum of validation errors (computed only on the 25% you didn't fit each time) which could be the absolute or squared difference from the points to the line?
This is an example implementation of the above idea. I've run it once with absolute loss, once with squared loss, nothing's cherrypicked. Indeed, 6.png with absolute loss doesn't look great, I think that's because I don't allow polynomial degrees >10, but as you'll see when running this script in the terminal, when high-degree polynomials overfit, errors become gigantic - it's probably safe to set the limit way higher. There are four magic constants, which I'm not sure would work super great on real data, but who knows: MIN_POLY_DEGREE
(which is lowered if there are less than MIN_POLY_DEGREE / 3 datapoints), MAX_POLY_DEGREE
(which is capped if there are less than MAX_POLY_DEGREE / 10 datapoints) - 3 and 10 being the other two magic numbers. The rest is all as straightforward as you could implement it.
Thanks Eugenio! You mean doing cross-validation offline, right? To find smoothing parameters that work well for real-world Beeminder graphs? Reasonable!
As for the actual smoothing algorithm, quick recap from offline discussion:
Thanks Eugenio! You mean doing cross-validation offline, right? To find smoothing parameters that work well for real-world Beeminder graphs? Reasonable!
I actually thought more about doing it "online", as in doing it every time a datapoint is added. It's not that computationally expensive, it's just a bunch more fits to do instead of just one.
Polynomial fit is fundamentally the wrong approach -- it's always going to introduce unjustified wiggles
Actually, I kind of see the point but I'm not sure I agree... well, I do agree, but I'm not sure the fundamentally right approach is the right approach in practice. Yes,
Non-causal filters are 👍
this might be a better option, and filtfilt is just a subset of those so one could even do fancier stuff, but the polynomial approach does have the advantage of being really, really simple. Figuring out a good cutoff for eg. the EMA filter is tricky, and there are many more choices for that value than for the degree of a polynomial, and the parameter space is not as straightforward as the degree of a polynomial either... maybe there is a simple way to do it that works as well as my above examples, but I don't know enough signal processing to know what it would be.
Using cross-validation to find the best number of knots to fit for each individual data series. Spline code mostly copypasted from the internet, so definitely breaking a couple of thousands of licenses - use just to determine whether the spline method is a good fit (haha, nice pun, me!).
Also, while I don't have access to your internal anonymized data, I do have my own weight (courtesy of a beeminder goal reminding me to weigh myself!). Using the exact same spline method from above, I get this (cross validation chooses 4 knots):
There seems to be two separate features to update in beebrain, which I am not sure whether the above discussion distinguishes: 1. The moving average and 2. the aura. Since most of the discussion centered around polynomials, I was assuming people were referring to the aura with respect to the updates, but I think both need to be updated.
I think the two features differ in their goals: 1 - The moving average essentially wants to be a "descriptive" tool to show what happens with past data, once daily fluctuations are cleaned up. 2 - The aura (I think) wants to be both a predictive model, which shows how one should expect future data to look like, possibly to help plan ahead, as well as a descriptive visual that shows an "envelope" of how data changed in the past. I am not sure it can do both of these things well.
Here are some technical thoughts on both:
1 . The purple moving average is at the moment a simple exponential filter, which is why it always introduces a delay and seems to always lag behind datapoints. Some of the suggestions above, including spline interpolation could be used to replace that, but it also seems to me that a properly designed low-pass filter applied non-causally (i.e. once forward and once backwards) would address most of the issues with the purple averaging filter. It also seems like we could augment the output of this filter with a variance envelope to show how the spread of data around this average changed over time somewhat like what the aura does right now.
So, the following are among some of the things that one might do: a - Replace the moving average filter with a properly designed low-pass, non-causal filter that does not introduce delays b - Augment the averaging filter in (a) with a local variance values to show an "envelope" similar to the aura, except without any predictive claims. c - Replace the polynomial fit for the aura with a different model, either a chain of polynomials segmented based on monotonic regions in the average curve in (a), neural nets etc. etc. d - Get rid of the aura altogether since (b) now provides the "envelope" visualization and it is not reasonable to expect that we can predict anything about people's data behavior :) e - Limit the aura to just the past few weeks to predict the next week and just use something similar to the polynomial fit we have right now, augmented with the regularization ideas from above.
It seems to me that (a) and (b) are doable and should be done. (c) needs ideas and experimentation on actual data. (d) is probably just me being lazy. (e) probably is a good compromise?
Love these thoughts and great point about the distinction between the moving average line and the aura.
You mentioned https://github.com/markert/fili.js which looks smart.
I totally agree about prediction being hopeless here.
So, yeah, let's turn the moving average into a nice smooth acausal filter, kill the polyfit, and what we now call the aura can instead just be an option to turn on an envelope around that fit line.
Some before/after shots of Uluc's new acausal moving average smoothing:
We get complaints about things like this occasionally, how the aura can do silly things like curve up at the end despite the data trending pretty unambiguously down:
It would be really fun to find something better!
Notes / Ideas
Verbata: math nerdery, data smoothing, polyfit, graph aesthetics, causal vs acausal filters, moving average,