Closed mourner closed 8 years ago
Thanks, yes this is a good idea. It should increase accuracy.
Pretty easy to implement: in the pre-processing stage we can generate all deltas up to N days and include all the input features spanning N-days.
However, I think there should be some decay applied to the weight of older features, because the further you go in time, once effect peaks, the less relevant inputs should become (just a hunch).
Also: I think the accuracy can benefit from some random shuffling of sample-orders and stacking them. In online learning, early examples have an advantage because learning rate decays with time. Currently I sort by abs(delta) which makes the big delta examples more important.
I'll try to tackle this when I get some more free time.
; porque solo propio pollino ooo looo oooooooiio oi ooooooiio lo ido;
Added support for any number of days history. This increases the number of data-points to train on, and hopefully reduces variance and random daily-noise.
Currently the default history is set to 3 days (NDAYS
variable in Makefile
) meaning for every day user-entered data, 3 data-points will be created for training: last-day, last-two-days, and last-3-days. Any "last N days" period includes net weight change in the last N days overall, as well as the factors for all N days combined (day1, day2, ..., dayN). I felt more than 3 days is excessive since we also run over all overlaps using a N-day sliding window.
I also restored the --bootstrap N
parameter to vowpal-wabbit since I found it helpful for decreasing variance. If your vw
doesn't support --bootstrap N
you may either:
vw
to a more recent version OR--bootstrap N
from VW_ARGS
in the Makefile
Daily weight changes are considered a very unreliable metric, prone to unpredictable fluctuations. Additionally, some factors may not be reflected immediately within 24h, but have a more long-term effect.
It would be great to take the same data and aggregate both factors and weight delta on a per-week basis, and then see whether the resulting factors are different from the existing daily results. Using the data you already have, it may give some new insights.
cc @arielf