adw96 / breakaway

Species richness with high diversity
68 stars 18 forks source link

Made plotting function enhancements #65

Closed jvhagey closed 4 years ago

jvhagey commented 5 years ago

Hi @bryandmartin and @adw96,

I made some enhancements to the plotting function for my own use and thought others might like to use them. I edited your currently plotting function because when running divnet with a covariate you get redundant output in the graph. For example,

#Loading data
library(phyloseq)
library(DivNet)
library(tidyverse)
data("GlobalPatterns")

# reducing data
water <- GlobalPatterns %>%
  subset_samples(SampleType %in% c("Freshwater",  "Freshwater (creek)", 
                   "Ocean", "Sediment (estuary)")) %>%   tax_glom("Order")

#running divnet without covariate
dv_water <- divnet(water, ncores = 3)

#running with a covariate
dv_water_st <- water %>%  divnet(X = "SampleType", ncores = 3)

#previous graphing function
plot(dv_water_st$shannon, water, color = "SampleType", trim_plot = TRUE)

old_plot

#new graphing
plot_alpha_estimates_custom(dv_water_st$shannon, 
                            water, color = "SampleType", 
                            shrink = "SampleType", trim_plot = TRUE)

new_plot

Also, this plotting function allows you to add a new arguments for faceting on the x or y axis if that's what someone might want to do.

Since the same plotting function is used for breakaway and divnet it seemed that allowing the "shrink" argument for breakaway wouldn't be good as it returns individual estimates for richness and you can't have covariates, correct? Thus, I have the function check to make sure you aren't plotting a breakaway estimate with the shrink argument. Another example:

#you can use this with breakaway objects too
ba <- breakaway(water)

#previous plotting
plot(ba, water, color = "SampleType")

#new plotting
plot_alpha_estimates_custom(ba, water, color = "SampleType")

#new plotting if you try to shrink on a covariate
plot_alpha_estimates_custom(ba, water, color = "SampleType", shrink="SampleType")

Error in plot_alpha_estimates_custom(ba, water, color = "SampleType",  : 
  You can't shrink a plot with breakaway estimates

The only minor hiccup still not addressed is that you shouldn't use the "shrink" argument without having a covariate in your model. So it currently will always print a warning to remind you of this, but won't stop you. However, the points overlap and look weird in the graph so helpfully that is also a hint something went wrong.

plot_alpha_estimates_custom(dv_water$shannon, 
                            water, color = "SampleType", 
                            shrink = "SampleType", trim_plot = TRUE)

bad_plot

In plot_alpha_estimates_custom(dv_water$shannon, water, color = "SampleType",  :
  Warning you should not shrink your graph unless you have used a covariate in the model.
Additionally, only shrink on the covariate

Here is the current code. There are probably cleaner ways to do this and I'm open to suggestions to tidy it up. If you think this whole "shrink" idea shouldn't be implemented for some reason let me know. Let me know your thoughts.

#`@param facet.y (Optional). Default NULL. Character string. The sample variable to facet on the y axis.
#`@param facet.x (Optional). Default NULL. Character string. The sample variable to facet on the x axis.
#`@param shrink (Optional). Default NULL. Character string. The sample variable to facet on the x axis.\n Note: You can't use this with a breakaway object and on an object that does not have a covariate.

plot_alpha_estimates_custom <- function(x, physeq = NULL, measure = NULL, facet.y=NULL, facet.x=NULL, shrink=NULL,
                                        color = NULL, shape = NULL, title = NULL, trim_plot = FALSE, ...) {

  if (!is.null(shrink)) { #checks to make sure x isn't a breakaway object 
    name_check <- x %>% lapply(function(x) x$name) %>% unlist %>% unique
    if(name_check=="breakaway") {
      stop("You can't shrink a plot with breakaway estimates")
    }
  }

  if (is.null(measure)) {
    all_measures <- x %>% lapply(function(x) x$name) %>% unlist %>% unique
    measure <- all_measures[1]
  }

  df <- summary(x, physeq) 

  if (all(is.na(df$estimate))) {
    stop("There are no estimates in this alpha_estimates object!")
  }
  if (!is.null(facet.x)) { #new
    if (facet.x %in% phyloseq::sample_variables(physeq)) {
      df[["facet.x"]] <- phyloseq::get_variable(physeq, facet.x)
    } else if (length(facet.x) == nrow(df)) {
      df[["facet.x"]] <- facet.x
    } else {
      stop("facet must either match a variable or be a custom vector of correct length!")
    }
  }
  if (!is.null(facet.y)) { #new
    if (facet.y %in% phyloseq::sample_variables(physeq)) {
      df[["facet.y"]] <- phyloseq::get_variable(physeq, facet.y)
    } else if (length(facet.y) == nrow(df)) {
      df[["facet.y"]] <- facet.y
    } else {
      stop("facet.y must either match a variable or be a custom vector of correct length!")
    }
  }
  if (!is.null(color)) {
    if (color %in% phyloseq::sample_variables(physeq)) {
      df[["color"]] <- phyloseq::get_variable(physeq, color)
    } else if (length(color) == nrow(df)) {
      df[["color"]] <- color
    } else {
      stop("color must either match a variable or be a custom vector of correct length!")
    }
  } 
  if (!is.null(shape)) {
    if (shape %in% phyloseq::sample_variables(physeq)) {
      df[["shape"]] <- phyloseq::get_variable(physeq, shape)
    } else if (length(shape) == nrow(df)) {
      df[["shape"]] <- shape
    } else {
      stop("shape must either match a variable or be a custom vector of correct length!")
    }
  } 

  yname1 <- measure
  yname2 <- x[[1]]$estimand
  if (is.null(physeq) & !is.null(rownames(df))) {
    df$sample_names <- rownames(df)
  }

  if (!is.null(shrink)){ #new
    warning(paste("Warning you should not shrink your graph unless you have used a covariate in the model.\nAdditionally, only shrink on the covariate"))
    ps.data <- as.data.frame(sample_data(physeq)) #pull sample data from physeq object into dataframe
    ps.data$sample_names <- sample_names(physeq) #make a column of sample names
    df[,shrink] <- ps.data[,shrink][match(ps.data$sample_names, df$sample_names)] #add metadata column to divnet df
    df <- df[!duplicated(df$estimate),] #remove duplicates
  }

  if (is.null(shape) & is.null(color)) {
    my_gg <- ggplot2::ggplot(df)
  } else if (is.null(shape) & !is.null(color)) {
    aes_map <- ggplot2::aes_string(color = "color")
    my_gg <- ggplot2::ggplot(df, aes_map)
  } else if (!is.null(shape) & is.null(color)) {
    aes_map <- ggplot2::aes_string(shape = "shape")
    my_gg <- ggplot2::ggplot(df, aes_map)
  } else if (!is.null(shape) & !is.null(color)) {
    aes_map <- ggplot2::aes_string(color = "color", shape = "shape")
    my_gg <- ggplot2::ggplot(df, aes_map)
  }

  if (!is.null(shrink)){ #new 
    my_gg <- my_gg +
      ggplot2::geom_point(ggplot2::aes_string(x = shrink, y = "estimate"))+
      ggplot2::xlab("")+
      ggplot2::ylab(paste(yname1, "estimate of", yname2)) +
      ggplot2::labs(title = title, color=color) +
      ggplot2::theme_bw() +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))

    if (!(all(is.na(df$lower)) || all(is.na(df$upper)))) {
      my_gg <- my_gg + 
        ggplot2::geom_segment(ggplot2::aes_string(x = shrink, xend = shrink, y = "lower", yend = "upper"))
    }
  } else if (is.null(shrink)) { #new
    my_gg <- my_gg +
      ggplot2::geom_point(ggplot2::aes_string(x = "sample_names", y = "estimate")) +
      ggplot2::ylab(paste(yname1, "estimate of", yname2)) +
      ggplot2::xlab("") +
      ggplot2::labs(title = title, color=color) + #added color so it will have appropriate legend title
      ggplot2::theme_bw() +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))

  }
  if (!is.null(facet.y)) { #new
    my_gg <- my_gg +
      ggplot2::facet_grid( facet.y ~ ., scales="free", space="free_x")
  }

  if (!is.null(facet.x)) { #new
    my_gg <- my_gg +
      ggplot2::facet_grid( . ~ facet.x, scales="free", space="free_x")
  }

  if (is.null(shrink)) {
    if (!(all(is.na(df$lower)) || all(is.na(df$upper)))) {
      my_gg <- my_gg + 
        ggplot2::geom_segment(ggplot2::aes_string(x = "sample_names", xend = "sample_names", y = "lower", yend = "upper"))
    }                                            
  }

  if (!trim_plot) {
    fiven <- stats::fivenum(df$upper, na.rm = TRUE)
    iqr <- diff(fiven[c(2, 4)])
    if (!is.na(iqr)) {
      out <- df$upper < (fiven[2L] - 1.5 * iqr) | df$upper > (fiven[4L] + 1.5 * iqr)
      ylower <- min(0, 0.95*min(df$upper[!out]), na.rm = TRUE)
      yupper <- 1.05*max(df$upper[!out], na.rm = TRUE)

      my_gg <- my_gg +
        ggplot2::coord_cartesian(ylim = c(ylower,yupper)) 
    } 
  } 

  my_gg
}
adw96 commented 4 years ago

Thanks again for posting this, @jvhagey ! I'm going to close it as an issue but refer folks back here in future. My compliments on your plot again 😸