david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
106 stars 11 forks source link

Can I make a correlation heatmap between taxa and categorical variable? #161

Closed timz0605 closed 5 months ago

timz0605 commented 5 months ago

Hello David!

First of all, thank you for this amazing package!

I am trying to make some heatmaps to help visualize my data, and I am wondering if there are any ways to create correlation heatmaps between taxa and categorical variables? For example, my samples come from 4 sites (in metadata they are under the column locality), and each of the site could be classified as high, medium, or low impact. I was wondering if there is a way to create correlational heatmaps like that?

As a reference, below is the code I use to generate a composition heatmap:

taxa_heatmap <- ps_microviz %>%
  tax_agg("order") %>% 
  tax_transform(trans = "hellinger", rank = "order") %>%
  tax_filter(min_prevalence = 0.01, use_counts = TRUE) %>%
  comp_heatmap(
    colors = heat_palette(sym = TRUE), grid_col = NA,
    sample_side = "top", name = "Abd.",
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))),
    sample_anno = sampleAnnotation(
      Location = anno_sample("locality"),
      col = list(Location = cols), border = FALSE))
david-barnett commented 5 months ago

glad you're finding microViz useful

To do this you need to recode your categorical data as numeric indicator variables. This is easy with ps_mutate and if_else or case_when

For nominal categorical variables like your locality variable.

ps_microviz < - ps_microviz %>%
    ps_mutate(
          loc1 = if_else(locality == "placeA", true = 1L, false = 0L),
          loc2 = if_else(locality == "placeB", true = 1L, false = 0L),
          loc3 = if_else(locality == "placeC", true = 1L, false = 0L),
          loc4 = if_else(locality == "placeD", true = 1L, false = 0L)
    )

And for ordinal categorical variables like the high, med, low, impact you mentioned could either do the same thing or alternative create something like this

ps_microviz < - ps_microviz %>%
    ps_mutate(impact_num = case_when(
        impact == "high" ~ 3,
        impact == "medium" ~ 2, 
        impact == "low" ~ 1
  ))

and then just use these for the correlation heatmap