biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
58 stars 25 forks source link

Cleaning up q2 value #87

Closed mortonjt closed 4 years ago

mortonjt commented 4 years ago

Addresses #82 . Also removing the table arguments as suggested by @fedarko

We're simplifying the computation of the q2. Namely it is Q2=1 - model/baseline

where

model = average model error

baseline = average baseline error

Its similar to the Q2 statistic - but we are using mean absolute error instead of sum of squares error here. See reference below

https://stats.stackexchange.com/questions/292673/validation-metrics-r2-and-q2-for-partial-least-squares-pls-regression https://inf.ethz.ch/personal/mcbrian/pdfs/press_paper_V2.pdf

mortonjt commented 4 years ago

@fedarko / @lisa55asil , thank you for your comments earlier. This is a minor change, but could help with diagnosing model fit. Feel free to comment - I'm hoping to merge this in tomorrow.

fedarko commented 4 years ago

Having an explanation for this is nice! I know you already merged this in (sorry for the response delay), but here are my thoughts:

  1. This value is technically called "q squared", right? I would suggest using markdown syntax like Q2 (that's uh _Q<sup>2</sup>_) when referring to this value, since otherwise I think people might confuse this for "Explaining QIIME 2"...
  2. Might be worth moving the section that explains this value down in the README to the QIIME 2 output FAQs, in particular the part within this section that discusses the paired summaries. (People using standalone Songbird + tensorflow may be confused when they see this.)
  3. This is low priority, but you could totally add a link from the paired visualization output to your explanation in the README analogously to the current README links in the QIIME 2 summaries:

https://github.com/biocore/songbird/blob/0f71927b95c84c9ea96f105ef450f9ca2ebb081b/songbird/q2/_summary.py#L109-L112

could become something like (dependent on where you end up putting the q2 value explanations in the README)

        if q2 is not None:
            index_f.write(
                '<p><strong><a href="https://github.com/biocore/songbird#43-explaining-q2">Pseudo Q-squared:</a></strong> %f</p>\n' % q2
            )

None of this is super critical, but I think these would be useful complementary changes. I can write this up in a GitHub issue or something if you'd like. (don't really have time at present to dive back into editing the docs for this, sorry...)

mortonjt commented 4 years ago

These are good ideas - let me wrap up a PR for this tomorrow.