david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
94 stars 10 forks source link

Use of custom transofmration in linear models. #160

Open SilasK opened 1 month ago

SilasK commented 1 month ago

I would like to use variations of CLR transformation.

e.g., non-zero or iqlr CLR, both using only a subset of the features to estimate the geometric mean.

I can transform a phyloseq object before the taxatree_models, but then it complains that the input contains non-zero values, which messes up the aggregation. I found no way to turn off the aggregation.

Ideally, one could add a custom function to tax_aggregate or directly implement these transformations in microViz.

Here is my function:

#' @title get otu table as a matrix with taxa are columns format
#' @description A wrapper to get otu table as a matrix with taxa are columns format
#' @param pseq The phyloseq object.
#' @return otu table as a matrix with taxa are columns format
#' @examples
#' get_otu_matrix(pseq)
#' @export
#' @import phyloseq
otu_matrix <- function(pseq) {
  OTU <- as(phyloseq::otu_table(pseq), "matrix")

  if (phyloseq::taxa_are_rows(pseq)) {
    OTU <- t(OTU)
  }

  OTU
}

#' @title robust (non zero) centered log transformation
#' @description A wrapper to perform Bayesian replacement of zeroes and CLR-transformation. It is rebust in the sense that it takes only values that are not zero in the calculation of the geometric mean.
#' @param pseq The phyloseq object.
#' @param log_fun The log function to use. Default is log2.
#' @return phyloseq object with CLR-transformed abundances.
#' @export
#'
rclr_transformation <- function(pseq, log_fun = log2) {
  if (class(pseq) != "phyloseq") {
    stop("pseq must be a phyloseq object.")
  }

  abundances <- otu_matrix(pseq)
  mdat <- microbiome::meta(pseq)
  if (!identical(rownames(abundances), rownames(mdat))) {
    stop("OTU table and metadata are not in order.")
  }

  # remove all columns with all zeros
  abundances <- abundances[, colSums(abundances) > 0]

  abundances.log <- zCompositions::cmultRepl(abundances, label = 0, method = "CZM",z.delete = FALSE) |>
    log_fun() |>
    as.matrix()

  abundances.log.nz <- abundances.log
  abundances.log.nz[abundances == 0] <- NaN

  geo_means <- rowMeans(abundances.log.nz, na.rm = TRUE)

  transformed <- abundances.log - geo_means

  ps <- phyloseq::phyloseq(
    phyloseq::otu_table(t(transformed), taxa_are_rows = TRUE),
    phyloseq::sample_data(pseq),
    phyloseq::phy_tree(pseq, errorIfNULL = FALSE),
    phyloseq::tax_table(pseq)
  )

  return(ps)
}
david-barnett commented 1 month ago

interesting suggestion Silas,

I could allow users to supply any transform as a function instead of a string,

the transform would have to take and return an otu_table, acting as a drop-in replacement for this line within tax_transform

otu <- otuTransform(otu = otu, trans = trans, ...)

it would need to relax argument type checks in a few places to allow this...

I won't have time to look at this for a few weeks probably, feel free to attempt a PR if its more urgent and that will speed things up

cheers David