Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Is there a way to get `cor` to work? #90

Open ekernf01 opened 3 years ago

ekernf01 commented 3 years ago

When I try cor, I get errors like Error in cor(DelayedArray(x), DelayedArray(y)) : 'x' must be numeric. Am I missing something simple?

If it's not implemented, I have a little function for this and I could send a pull request.

#' Drop-in replacement for cov or cor; tolerates DelayedArray input. Pearson correlation only. 
#'
delayedCov = function(x, y = x, do_cor = F){
  mux = colMeans(x)
  muy = colMeans(y)
  zx = sweep(x, 2, mux, "-")
  zy = sweep(y, 2, muy, "-")
  covxy = t(zx) %*% zy
  covxy = covxy / nrow(x)
  if(do_cor){
    sx = sqrt(colMeans(zx^2))
    sy = sqrt(colMeans(zy^2))
    covxy = Matrix::Diagonal(x = 1/sx) %*% covxy %*% Matrix::Diagonal(x = 1/sy)
  }
  covxy
}
# tests
x = matrix(rnorm(1000), ncol = 10)
y = matrix(rnorm(1000), ncol = 10)
plot(delayedCov(x, y),  cov(x, y)); abline(a = 0, b = 1)
plot(delayedCov(x, y, do_cor = T),  cor(x, y)); abline(a = 0, b = 1)
plot(delayedCov(DelayedArray(x), DelayedArray(y), do_cor = T),  cor(x, y)); abline(a = 0, b = 1)