biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
58 stars 25 forks source link

Examples and/or more precise information on "good enough" data re: model fitting #112

Open fedarko opened 4 years ago

fedarko commented 4 years ago

Talked about this with @cameronmartino. Essentially, the red sea dataset used in the README is a really "nice" example of a model fitting -- this is great but can be confusing to people with more noisy data where you will still get some sort of model fit, but it isn't nearly as nice (e.g. your pseudo-Q2 score is less than 0.73...)

Having more precise information about when you're "done" would be beneficial to users.

forum xref

mortonjt commented 4 years ago

It's important to take the Q2 value as a grain of salt - we know that it is using the wrong distance metric. And the stats behind R2 for multinomial regression is still an outstanding problem in the statistical community - I doubt we'd be able to make traction on that (it'll be a major feat).

So here anything above 0 is potentially considered reasonable.

fedarko commented 4 years ago

For reference, the README section on Q^2 values has been updated to be less strict. I guess we could still add example(s) with less perfect data :), so I'm going to leave this issue open for now.