david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
94 stars 10 forks source link

Errors making correlation heatmaps #46

Closed BrigittevdG closed 2 years ago

BrigittevdG commented 2 years ago

Hi David,

I want to make correlation heatplots for my thesis about the sample data (age, gender, BMI, ect) and microbiome data. First i made the data numeric (object ps.minzeros.compositional.num). I tried to make the correlation heatmap according to your tutorial, but I got several errors.

This code: cor_heatmap( data = ps.minzeros.compositional.num, taxa = tax_top(ps.minzeros.compositional.num, 15, by = max, rank = "Genus"), vars=c('female', 'older', 'highWeight', 'obesitas', 'highRatio', 'highBodyFat', 'highTG', 'highScore'), cor='spearman')

gives this error: Error in as(x, "matrix")[i, j, drop = FALSE] : subscript out of bounds In addition: Warning message: In ps_counts(data, warn = TRUE) : otu_table of counts is NOT available! Available otu_table contains non-zero values that are less than 1

And another attempt: ps.minzeros.compositional.num %>% tax_agg("Genus") %>% cor_heatmap(vars = c('female', 'older', 'highWeight', 'obesitas', 'highRatio', 'highBodyFat', 'highTG', 'highScore'))

gives this error: Registered S3 method overwritten by 'seriation': method from reorder.hclust vegan Error in if (!all(x >= 0)) stop("Negative distances not supported!") : missing value where TRUE/FALSE needed In addition: Warning message: In stats::cor(x = meta_mat, y = otu_mat, use = cor_use, method = cor) : the standard deviation is zero

Can you help me? I cannot share my data because that is confidental, so I hope you have enough to help me.

Kind regards, Brigitte

david-barnett commented 2 years ago

Hi Brigitte, do any of the variables work correctly? You can try them one at a time (although then you have to suppress the clustering, so try the code below)

ps.minzeros.compositional.num %>% tax_agg("Genus") %>% cor_heatmap(vars = 'female', seriation_method = "Identity")

Change the variable (in vars = ...) until you find which one(s) cause an error. Let me know if that variable contains NAs, or if they are all the same value (variance of zero)?

# easy way to inspect your sample data by eye (in RStudio)
View(samdat_tbl(ps.minzeros.compositional.num))
BrigittevdG commented 2 years ago

Hi David, thank for the quick informative respons!!

Yes, I can see now which variable are not working:

Is that warning to be worry about? I still get a visualization.

I have no missing values (no NAs). I see now that for those four problem variables, they all have the same values. But that is incorrect, because I reordered the valules like this:

ps.minzeros.compositional.num <- ps.minzeros.compositional %>%
  ps_mutate(
    female = if_else(Gender == "Female", true = 1, false = 0),
    older = if_else(AgeGroup== 'Group 61-65 years', true = 1, false = 0),
    highWeight = if_else(WeightGroup== 'Group > 82 kg', true = 1, false = 0),
    obesitas= if_else(BMIGroup== 'Obesitas: BMI >=30', true = 1, false = 0),
    highRatio=if_else(WeightGroup== 'Ratio >= 0.92', true = 1, false = 0),
    highBodyFat=if_else(WeightGroup== 'Body Fat >= 34.70', true = 1, false = 0),
    highTG=if_else(WeightGroup== 'TG > 1.170', true = 1, false = 0),
    highScore=if_else(WeightGroup== 'Score > 1.682', true = 1, false = 0),
    InterventionGroup=if_else(Group== 'I', true = 1, false = 0),
  )

It looks like that this mutation is not correctly done for the four variables of highRatio, highBodyFat, highTG and highScore. But for the other variables it was done correctly. Do you have any idea?

david-barnett commented 2 years ago

my guess is typos in variable names, you have used WeightGroup multiple times

david-barnett commented 2 years ago

Regarding the warnings, these occur because cor_heatmap expects count data to be available in the input object, in order to draw prevalence/abundance annotation plots. If you have transformed the data to compositional using tax_transform, then the count data will be stored too, for use by the heatmap functions etc. But if you transformed it some other way, or extracted the phyloseq from the ps_extra object after transformation, you will lose those counts.

BrigittevdG commented 2 years ago

Thanks!! That was indeed the case...... I forgot to change the names after copied some code. Sorry for this stupid mistake!

About the warning; I transformed the data to relative abundances with this code: ps.compositional <- microbiome::transform(ps, "compositional")

I guess that is the correct way? Because in my correlation heatmap I see data about the prevalence and abundance both now.

david-barnett commented 2 years ago

No worries, easy mistake to make 🙂 You can also do the same transformation with microViz tax_transform(), which keeps the counts needed for the heatmap. It is probably fine how you have done it though, but do check if the plotted values makes sense for your data.

BrigittevdG commented 2 years ago

Thanks a lot for your help today!!