clemente-lab / mmeds-meta

A database for storing and analyzing omics data
https://mmeds.org
2 stars 1 forks source link

Improved summaries #322

Open cleme opened 2 years ago

cleme commented 2 years ago

Is your feature request related to a problem? Please describe. Current summaries are bare-bones providing results with little text to inform the user of what the plots represent. While some users are familiar with microbiome analysis (alpha, beta diversity, taxa plots, LEfSe), most are likely not.

Describe the solution you'd like Summaries that resemble the methods & results section of a paper.

Describe alternatives you've considered Metabolon provides good example summaries generated from their MetaboAnalyst pipeline. Some examples attached as screenshots.

Example 1 date, ID of user, maybe include study name/study ID..

Screen Shot 2021-10-14 at 1 59 54 PM

Example 2 a table with description of features. This could be based on output of taxa table summary.

Screen Shot 2021-10-14 at 2 00 40 PM

Example 3 plot with automatically generated legend that can be used directly into a paper

Screen Shot 2021-10-14 at 2 01 06 PM

Example 4 analysis description and resulting plot

Screen Shot 2021-10-14 at 2 05 39 PM
cleme commented 2 years ago

To better organize PR and help with overall structure, I will create more specific issues than link to this one: once the smaller issues are all closed, we will close this.

adamcantor22 commented 2 years ago

Closed issue #247 may have more things to do regarding this

adamcantor22 commented 2 years ago

Goal: improve summaries to the point where we can run it on Ryan Walker's 'BreastMilk' study and it will be at all helpful. This is what it is currently: https://www.dropbox.com/s/ec4m0kkxw48zkj5/Walker_BreastMilk.20220810.pdf?dl=0

adamcantor22 commented 2 years ago

Considering making a new sub-issue to this meta issue, 'improved summaries: handling high number of samples'. We've discussed in the past that it's acceptable for summaries to not be completely clear in these cases, but I feel there must be something we can do to improve them. Take these examples from the BreastMilk study: sample read retention and taxa at L6:

Screenshot from 2022-08-10 11-12-04 Screenshot from 2022-08-10 11-12-36

It's impossible to even read the IDs of the samples in the legend. What if in cases with greater than n samples, we did something like selecting a subset of samples that did a good job of representing the overall spread (of course with acknowledgement of what was not included in the plot)? Would appreciate any thoughts

cleme commented 2 years ago

With large number of samples it will be hard to do this with a PDF. Q2 offers a dynamic plot where you can zoom in/out to clarify, which makes it more feasible. I suggest we don't make a sub-issue until we think we want to go that route (i.e. not generate static PDFs but interactive plots), which I don't think will happen soon.