arielf / weight-loss

Machine Learning meets ketosis: how to effectively lose weight
Other
3.32k stars 150 forks source link

Deriving importance factors on a per-week basis #16

Closed mourner closed 8 years ago

mourner commented 8 years ago

Daily weight changes are considered a very unreliable metric, prone to unpredictable fluctuations. Additionally, some factors may not be reflected immediately within 24h, but have a more long-term effect.

It would be great to take the same data and aggregate both factors and weight delta on a per-week basis, and then see whether the resulting factors are different from the existing daily results. Using the data you already have, it may give some new insights.

cc @arielf

arielf commented 8 years ago

Thanks, yes this is a good idea. It should increase accuracy.

Pretty easy to implement: in the pre-processing stage we can generate all deltas up to N days and include all the input features spanning N-days.

However, I think there should be some decay applied to the weight of older features, because the further you go in time, once effect peaks, the less relevant inputs should become (just a hunch).

Also: I think the accuracy can benefit from some random shuffling of sample-orders and stacking them. In online learning, early examples have an advantage because learning rate decays with time. Currently I sort by abs(delta) which makes the big delta examples more important.

I'll try to tackle this when I get some more free time.

oskarizu commented 8 years ago

; porque solo propio pollino ooo looo oooooooiio oi ooooooiio lo ido;

arielf commented 8 years ago

Added support for any number of days history. This increases the number of data-points to train on, and hopefully reduces variance and random daily-noise.

Currently the default history is set to 3 days (NDAYS variable in Makefile) meaning for every day user-entered data, 3 data-points will be created for training: last-day, last-two-days, and last-3-days. Any "last N days" period includes net weight change in the last N days overall, as well as the factors for all N days combined (day1, day2, ..., dayN). I felt more than 3 days is excessive since we also run over all overlaps using a N-day sliding window.

I also restored the --bootstrap N parameter to vowpal-wabbit since I found it helpful for decreasing variance. If your vw doesn't support --bootstrap N you may either: