biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

Add explanations for what exactly "Intercept" differentials mean #83

Open fedarko opened 4 years ago

fedarko commented 4 years ago

This has come up before, but I'm making it an issue here so it's officially written down somewhere.

From discussion with @antgonza and many other people :) Relates to biocore/qurro#229.

mortonjt commented 4 years ago

Yea, I need to write up a blog post on this - that'll be up within the next 3 weeks

fedarko commented 4 years ago

Was chatting with @antgonza today about formula stuff, and I found this video from one of the Patsy devs -- it does a super good job explaining both categorical encodings and intercept stuff.

A few relevant timestamps:

For "normal" uses of Patsy the intercept is the mean of whatever the "reference" group is, and everything else represents differences from this mean. So e.g. in the OLS example data on the screen at around 6:40, the Intercept coefficient (group 1 reference) is 46.4583, and the group 2 coefficient is 11.5417. And when you set group 2 as the reference instead, the group 1 coefficient is -11.5417 (because things have been flipped now), and the group 2 coefficient is 58 (aka 46.4583 + 11.5417).

I'm not quite sure how this translates to an interpretation of the Intercept differentials you get, but at the very least it'd be good to add a link to this video to the README in the future.

senaj commented 4 years ago

Thanks for raising this issue, fedarko! I had the same question.

fedarko commented 4 years ago

for reference, @mortonjt has written a blog post here explaining this in the context of Songbird. We may want to add a link to this from the README.