chrismckennan / CorrConf

Estimate latent covariates in high dimensional data when individual samples are correlated (i.e. related).
3 stars 2 forks source link

CorrConf

Estimate latent covariates in high dimensional data when individual samples are correlated (i.e. related). This package is also applicable when samples are independent.

Model description

The complete model for the data is Y = B X^T + Gamma Z^T + L C^T + E, where: Y is a p (number of units, i.e. genes) x n (number of individuals or samples) matrix X is a n x d matrix containing the covariates of interest Z is a n x r matrix of known nuisance covariates whose effects Gamma you don't necessarily care about (i.e. plate effects, sex differences) E is a p x n residual matrix, where the rows e_g have first and second moment (0, sigma_g^2 In + sigma{g,1}^2 B1 + ... + sigma{g,b}^2 B_b) for $g = 1,...,p$. The matrices $B_j$ are known n x n relatedness matrices (i.e. a kinship matrix, chip effect matrix, etc.) C is a n x K matrix of latent covariates to be estimated. One can then do inference for B once C is known. This package estimates K and then estimates C in the presence of correlated residuals.

Package functions

ChooseK: Estimate K, the number of columns in C. This function is also applicable for data with independent samples. See documentation for details. EstimateC_complete: Estimate the unobserved n x K covariate matrix C. This can be used in downstream software to do inference on B, the effects due to X. If "return.Bhat" is set to TRUE, the software will return the GLS estimate for B and p-values under the model Y ~ MN{ BX^T + Gamma Z^T + LC^T, diag(delta_1^2,...,delta_p^2), V}, where V is a linear combination of the matrices B_1,...,B_b. If samples are independent and K is known, we recommend the user use the package BCconf, which is available for download at https://github.com/chrismckennan/BCconf.