biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

Getting Q-squared values for standalone Songbird? #113

Open fedarko opened 4 years ago

fedarko commented 4 years ago

As far as I'm aware, these values are only computed using the summarize-paired QIIME 2 visualizer. It'd be ideal to support this for standalone Songbird, if possible (from going over the README it doesn't seem like the HPARAMS stuff is a perfect substitute).

gibsramen commented 4 years ago

On my recent forum post about a related issue it was suggested that an intermediate artifact/structure might be broadly useful for things like this involving the null/baseline model comparisons.

fedarko commented 4 years ago

Ah shoot I completely forgot about that post! Yes, if we can output that then that'd be perfect -- I know the QIIME 2 summarize-paired visualizer can take in SampleData[SongbirdStats] artifacts and produce a Q-squared value, and we could theoretically spin that code off into its own module within Songbird that takes in two TSV files and outputs a file / QZA containing the Q-squared value.

I think the hardest part here might be figuring out how to extract the SampleData[SongbirdStats] stuff from a standalone run of Songbird, then... or maybe it'd be easier to add an option to standalone Songbird that saves these stats to an analogous filetype to the SampleData[SongbirdStats] artifacts, say run-stats.tsv or whatever, and then add a simple script that just computes the Q-squared value from two of those TSV files. Whatever would be easy to develop and not too terrible for end users is ok with me :)

mortonjt commented 4 years ago

Note that we do save checkpoints of the model in the standalone version, so all of those stats can be retrieved by directly utilizing the built model.

But I don't think it is a good idea to add in major changes until we switch everything over to pytorch.