Summary Overhaul - Githubissues

clemente-lab / mmeds-meta

A database for storing and analyzing omics data

https://mmeds.org

2 stars 1 forks source link

Summary Overhaul #247

Closed DSWallach closed 2 years ago

DSWallach commented 3 years ago

The default for upper is 'cor' which calcluates the correlations between groups, but for some datasets this will cause errors if there are not enough data points for a certain group. Modify to use a safer default e.g. ggpairs(df[,c(1:3)], upper = list(continuous = "points", combo = "box_no_facet"), lower = list(continuous = "points", combo = "dot_no_facet"), aes(color = df$GroupID, label = rownames(df), alpha=0.5)) + theme_bw() + theme(legend.position = 'none', plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) + labs(title = 'PCA plot', subtitle = 'Colored by SpecimenTimepoint')

DSWallach commented 3 years ago

This should be done if there are not enough samples in a particular group to produce the default statistics

DSWallach commented 3 years ago

https://github.com/jupyter/nbconvert/issues/1451 Related to broken summary functionality

DSWallach commented 3 years ago

https://github.com/t-makaro/nb_pdf_template Use this Also fix the formatting for the level value in the taxa summary

DSWallach commented 3 years ago

also add automatic prefixes to column names to allow them to start with numbers in Python

DSWallach commented 3 years ago

Taxa_bar_plot not appearing in file index of analyses

DSWallach commented 3 years ago

Skip continuous variables for beta group significance. Or automatically create bins

DSWallach commented 3 years ago

Cast all df column when running summary for alpha diversity files

as ` Stack all the different groups into a single dataframe

df = pd.concat(group_means, axis=0, sort=False) df.SamplingDepth = df.SamplingDepth.astype(float) df.Error = df.Error.astype(float) df.AverageValue = df.AverageValue.astype(float) df.Grouping = df.Grouping.astype(str) df.GroupID = df.GroupID.astype(str) df.GroupName = df.GroupName.astype(str) `

DSWallach commented 3 years ago

Replace all numbers with there word equivalent for summaries e.g. 5 -> 'Five'
Remove taxa plots for continuous variables
Sort the values before assigning them to continuous colors
Change the summary naming to reflect the study_name and date
Change the color palette for continuous variables to YlOrRed and use the max_colors value when defining it
Figure out how to have load_config allow nans in continuous variable columns
Have the summary plots all go into a sub-directory rather than the main summary directory

DSWallach commented 3 years ago

For taxa tables in francesca's data the id column was labeled 'level_0' rather than 'index' because a different column was 'index'. Add checks to automatically catch this situatio

DSWallach commented 3 years ago

The colors are defined starting from color0 but are used starting from color1. This can cause problems

adamcantor22 commented 2 years ago

Closing, linking in #322