Open shaorray opened 4 years ago
Often in correlation tests, people will use %variance explained to described X and Y relationship.
e.g. transcription explained 71% gene expression variance.
So here are two functions could be useful:
simple R2 concept
single_variance_explained <- function(X, Y, is.cor = F) { stopifnot(length(X) == length(Y)) rm.idx = is.na(X) | is.infinite(X) | is.na(Y) | is.infinite(Y) X = X[!rm.idx] Y = Y[!rm.idx] if (is.cor) return(cor(X, Y)) 1 - var(Y - X) / var(Y) }
R2 decomposition of regression through origin X: n-row matrix with k features, Y: n responses
multi_variance_explained <- function(X, Y){ stopifnot(var(Y) > 0 & length(Y) == nrow(X)) X_tilde = cbind(1, scale(X)) # add an intercept error term ε Y_tilde = scale(Y) beta_hat = solve(t(X_tilde) %*% X_tilde) %*% t(X_tilde) %% Y_tilde c(beta_hat t(cov(Y_tilde, X_tilde)))[-1] }
Often in correlation tests, people will use %variance explained to described X and Y relationship.
e.g. transcription explained 71% gene expression variance.
So here are two functions could be useful:
simple R2 concept
R2 decomposition of regression through origin X: n-row matrix with k features, Y: n responses