WWXkenmo / ENIGMA

A fast and accurate deconvolution algorithm based on regularized matrix completion algorithm (ENIGMA)
MIT License
26 stars 6 forks source link

CTS-DE Analysis #11

Open GUnit-Lab opened 1 year ago

GUnit-Lab commented 1 year ago

Thank you so much for designing this wonderful program. For the Cell Type Specific Differential Expression (CTS-DE) Analysis could you please offer more context for setting this up after normalizing the ENIGMA? In vignette the analysis is performed by: p <- 100 y <- as.numeric(gl(2, p/2)) - 1 DEG <- FindCSE_DEG(egm, y)

How are the values and formula for 'p' and 'y' determined? Doing 'Help?' for FindCSE_DEG does not define 'y'

Currently I cannot get this step to work and get error:

"CSE estimates and y have different length" Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'coef': variable lengths differ (found for 'Epithelium')

WWXkenmo commented 1 year ago

Hey,

Thanks for using our tool, and sorry for my late response as I went to Italy for vacation in past two weeks.

The provided exampleDataset is used for only run for demo, it's a simulation dataset so we know that y is [1,1,1...,1,0,0,0,...,0] vector.

Here y is phenotype vector, in real case, suppose we have three samples with sample-1,2,3, and the phenotype you interest with is ['control', 'case', 'control']. Then we encoded the phenotype vector as [0,1,0]. In this case, the positive ExpressionDifference means gene is up-regulated in 'case' sample.

Following above way to provide your y vector, p is not important and only used for demo.

Note: I updated the package, when run ENIGMA_trace_norm, please use following function

egm = ENIGMA_trace_norm(egm, do_cpm = FALSE, solver = "proximalpoint",pos=FALSE,preprocess = "none",alpha = 0.1,verbose = TRUE)

the newest added parameter do_cpm means whether we need to perform cpm normalisation perform deconvolution. in our simulation dataset, because gene expression is sampled from normal distribution, so you could regard the input gene expression matrix as Z-score transformed matrix, then perform deconvolution. Therefore any preprocessing is unnecessary, so do_cpm = FALSE, pos = FALSE (convert the resulted gene expression matrix is positive in all entries) and preprocess = "none" (don't need log or sqrt transformation)

Best, Ken

GUnit-Lab commented 1 year ago

Hi Ken, Thank you for the added explanation. So in your new example above I would need to do: y <- as.numeric(gl(0,1,0)) DEG <- FindCSE_DEG(egm, y)

WWXkenmo commented 1 year ago

No, gl is a function used to generate factor level, what you need is just

y = c(0,1,0)

in my provided case

Best, Ken

GUnit-Lab commented 1 year ago

Yay thank you so much, this corrected the initial issue for me and now my 3 control and 3 case samples are entered correctly. But with:

y <- c(0,0,0,1,1,1)
DEG <- FindCSE_DEG(egm_tn, y)

I now have new issue I cannot figure out how to solve: Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'coef': Argument mu must be a nonempty numeric vector

I looked at the traceback: _15. h(simpleError(msg, call)) 14. .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "Argument mu must be a nonempty numeric vector", base::quote(family$linkfun(mustart))) 13. family$linkfun(mustart) 12. etastart %||% { if (!is.null(start)) if (length(start) != nvars) stop(gettextf("length of 'start' should equal %d and correspond to initial coefs for %s", ... 11. glm.fit(x = structure(numeric(0), dim = c(0L, 14L), dimnames = list( NULL, c("(Intercept)", "Epithelium", "Glans.Mesenchyme", "Preputial.Mesenchyme", "Proliferating.Mesenchyme", "Glans.Distal.Dorsal", "Glans.Distal.Ventral.Late", "Glans.Proximal", "Glans.Proximal.Late", ... 10. eval(call(if (is.function(method)) "method" else method, x = X, y = Y, weights = weights, start = start, etastart = etastart, mustart = mustart, offset = offset, family = family, control = control, intercept = attr(mt, "intercept") > 0L, singular.ok = singular.ok)) 9. eval(call(if (is.function(method)) "method" else method, x = X, y = Y, weights = weights, start = start, etastart = etastart, mustart = mustart, offset = offset, family = family, control = control, intercept = attr(mt, "intercept") > 0L, singular.ok = singular.ok)) 8. glm(y ~ ., data = data.frame(t(x)), family = "binomial") 7. summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial")) 6. coef(summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial"))) 5. FUN(array(newX[, i], d.call, dn.call), ...) 4. apply(A, 1, function(x) { pval = coef(summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial")))[, 4] return(get_pval(pval, cell_type, K)) ... 3. test(O, y, covariate) 2. DEG_test(Exp, y, covariate) 1. FindCSE_DEG(egmtn, y)

I cannot figure out how to correct this issue with mu. Do you have ideas what I have done wrong and suggestions how I can fix? Thank you kindly Victor