confidence levels - Githubissues

bgits commented 8 years ago

Given the size of the data set and the outputted attributions/predictions it would be helpful to know what the confidence levels are.

arielf commented 8 years ago

Confidence levels for the toy data-set in the repository are very low. I don't have exact numbers though.

Accurate data, and more data is needed to make results more solid. Note the disclaimers in the README.md as far as the data-set is concerned. Please consider this data just a toy data-set in order to demo the code.

However the 4 high-level conclusions are based on a much longer period (not in the toy data-set) where a weight-loss was consistently achieved (a few years later, and totally out of sample) by applying the conclusions, as appearing in the summary (end of README.md).

Some observers have dismissed the whole study by saying I simply got myself into a regime of caloric restriction. This may be true, but the fact remains that I wasn't able to to achieve significant and lasting results before I got into the habit of 1) longer fasting periods, and 2) substituting carbs with fat (LCHF regime).

Some ideas were suggested for increasing accuracy & confidence. Among them: augmenting the data with aggregations and data shufflings/bootstrapping. I will tackle when I get some more free time.

In the end, this projects was about sharing code, ideas and a story of experimentation, exploration & discovery. Please feel free to use your own data with the code in this repository and improve on the code. Pull requests are highly appreciated.

mourner commented 8 years ago

Don't get discouraged by the negative comments! While I don't fully agree with conclusions as well, I absolutely LOVED the idea of applying machine learning to a personal fitness journey like this, and the way it was presented. Keep up the great work!

I would also highly encourage you to try strength training — either weight lifting (with heavy compound movements routine like squat + deadlift + bench press), or progressive calisthenics (on pull-up/parallel bars), whichever is more fun / fits your schedule. It can literally transform your life. This can be by far the biggest contributor to preserving muscle mass while on a caloric deficit, much more than a low-carb diet or intermittent fasting.

Another way to significantly increase the quality of the data would be to add other metrics in addition to weight changes, like fat measurements (e.g. approximations with calipers), waist line length, etc. Although you would probably have to start gathering data from scratch if you were to add them.

PeterMTaylor commented 8 years ago

Could Yoga help?

arielf commented 8 years ago

Added preliminary "confidence levels" per item (items aren't equal, some appear in the data more than others for instance) in the form of N random draws from a Poisson(1) distribution to jitter the given label. This random drawing establishes [min..max] ranges for all (averaged out) predictions. The bootstrapping parameter is currently set to 7 which seems to push the [min..max] range almost as far as it can go. It can be changed in the Makefile.

Why Poisson(1)?

So all estimates are derived from averaging repeated bootstrapping. The [min max] range per item as found in all rounds of bootstrapping are now recorded in the *.range file.

This still need more work: documentation and plotting, but the main/core work to establish confidence levels is done.

Best reference for method used is: https://arxiv.org/abs/1312.5021

Discussion on how to estimate the 95% confidence interval (for each line in the .range output) can be found on stackexchange: how to calculate a confidence level for a poisson distribution The short summary is the 95% confidence range equals: λ ± 1.96 \ sqrt(λ/n) Where λ is both the mean and the variance of the Poisson distribution and n is our bootstrap constant (7 by current default).

arielf / weight-loss

confidence levels #15