Winnie09 / Lamian

48 stars 11 forks source link

'chol2inv': not positive definite #16

Open chooliu opened 1 year ago

chooliu commented 1 year ago

Hi Dr. Hou!

I encountered the following choleski error when running lamian_test(): suspect it's a dataset-dependent issue, but wanted to reach out to see if you have any thoughts on troubleshooting / if you've seen this in any real-world Lamian applications.

Cheers!

Command in Line 1 & Error:

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a
method for function 'chol2inv': the leading minor of order 1 is not positive definite
Traceback:

1. lamian_test(expr = expression, cellanno = cellannot, pseudotime = pt, 
 .     design = dm, test.type = "variable", testvar = 2, permuiter = 50, 
 .     ncores = 1)
2. lapply(seq_len(permuiter + 1), function(i) fitfunc(iter = i, 
 .     diffType = "overall", test.type = test.type, testvar = testvar, 
 .     EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, verbose.output = verbose.output, 
 .     expr = expr, cellanno = cellanno, pseudotime = pseudotime, 
 .     design = design))
3. FUN(X[[i]], ...)
4. fitfunc(iter = i, diffType = "overall", test.type = test.type, 
 .     testvar = testvar, EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, 
 .     verbose.output = verbose.output, expr = expr, cellanno = cellanno, 
 .     pseudotime = pseudotime, design = design)
5. fitpt(expr, cellanno, pseudotime, design, testvar = testvar, 
 .     maxknotallowed = 10, EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, 
 .     ncores = 1, model = mod.full)
6. sapply(0:maxknot, bicfunc)
7. lapply(X = X, FUN = FUN, ...)
8. FUN(X[[i]], ...)
9. sapply(names(sname), function(ss) {
 .     phiss <- phi[sname[[ss]], , drop = F]
 .     dif2 <- sexpr[[ss]] - sexpr[[ss]] %*% (phiss %*% chol2inv(chol(crossprod(phiss)))) %*% 
 .         t(phiss)
 .     dif2 <- rowSums(dif2 * dif2)
 .     s2 <- dif2/(length(sname[[ss]]) - ncol(phi))
 .     log(2 * pi * s2) * nrow(phiss) + dif2/s2
 . })
10. lapply(X = X, FUN = FUN, ...)
11. FUN(X[[i]], ...)
12. chol2inv(chol(crossprod(phiss)))
13. chol(crossprod(phiss))
14. chol(crossprod(phiss))
15. chol.default(crossprod(phiss))
16. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", 
  .     base::quote(chol.default(crossprod(phiss))))
17. h(simpleError(msg, call))

Attempts to investigate:

Dataset Info: 10X Multiome, n = 6 samples, n = 2/group x 3 groups; tried 100 to 3000 genes and different variations on design matrix

Software Versions: Lamian_0.99.2, installed via Github commit 7d5a8ff R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

hoholee commented 1 year ago

Same issue here:

Warning message:
In mclapply(seq_len(permuiter + 1), function(i) { :
  all scheduled cores encountered errors in user code
[1] "The length of fit is ..."
[1] 1 1 1 1 1 1
   Mode   FALSE 
logical       6 
[1] "The length of fit after removing null is ..."
[1] 1 1 1 1 1 1
   Mode   FALSE 
logical       6 
Error in fit[[1]]$fitres.full : $ operator is invalid for atomic vectors

r$> fit[[1]]                                                                                                                                               
[1] "Error in h(simpleError(msg, call)) : \n  error in evaluating the argument 'x' in selecting a method for function 'chol2inv': the leading minor of order 1 is not positive definite\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'chol2inv': the leading minor of order 1 is not positive definite>

Maybe related to these issues: https://github.com/Winnie09/Lamian/issues/2 https://github.com/Winnie09/Lamian/issues/11 Also, my expression matrix, cellanno, and pseudotime all have same number of cells (~150k) so I don't think this is the reason causing the error.

Winnie09 commented 1 year ago

Hi @chooliu @chooliu Thank you for your strong interest in this software package! I am glad to help. It could be due to the pseudotime. Would you mind sharing with me your pseudotime and the design matrix?

Winnie09 commented 1 year ago

It could be due to the skewed distribution of the pseudotime, or in some range of the pseudotime, there are no cells.

chooliu commented 1 year ago

Thanks so much for the response & offering to look at the data! I'll take a peek in the next week to (1) confirm how the distribution of pseudotime varies with treatment and sample as well as (2) try alternative pseudotime estimation algorithms. I'll email those and the design matrix if this doesn't give insight.

Re: skewed distribution, does Lamian expect any scaling or transformation of pseudotime values (e.g., how some algorithms output pseudotime over range 0 to 1)?

Winnie09 commented 1 year ago

@chooliu Sure! Lamian did not perform a transformation or scaling on pseudotime values. The default pseudotime values are integers, from 1, 2, ...., up to the number of cells, which are the output from the default pseudotime inference method TSCAN.

mehtar1 commented 1 year ago

Hi! I've been encountering this same issue. All my cells have pseudotime values that are doubles ranging from 0.5 to ~25, with some cells having the same pseudotime value. If I rank the pseudotime instead, there may be some ties. Will that affect how the program runs or performs its calculations?

Winnie09 commented 1 year ago

Hi @mehtar1 Yes. Pseudotime ties may affect some of the calculate. One way to get around with it is to average the counts of the cells that have the same pseudotime, or remove duplicated ones.

mehtar1 commented 1 year ago

Hi @mehtar1 Yes. Pseudotime ties may affect some of the calculate. One way to get around with it is to average the counts of the cells that have the same pseudotime, or remove duplicated ones.

To test this, I selected just one cell from the tied pseudotime scores and it seems that I can run lamian_test without error now. However, our larger dataset will have many more pseudotime ties. Is there no way to include cells with the same pseudotime? Averaging the counts won't accurately represent our dataset while removing the duplicated ones will cause a large loss of data. Would love to hear if you have any potential fixes or workarounds!

kylagelev commented 1 year ago

Hi. Great package! I am running into a similar problem as above, and I was looking into perhaps if it has to do with the pseudotime distribution within samples and conditions as you mentioned beforehand. In this case, would the number of cells in each sample matter? As in if a couple of my samples have few numbers in comparison to others (ex: 500 cells vs 1000 cells) (extreme ex: 100 cells vs 1000 cells), could that potentially lead to the above lamian_test error? Would the number of cells within each condition also lead to an issue (ex: disease ~5000 cells and control ~9000 cells)?

chooliu commented 1 year ago

Sorry for very delayed follow-up, but re: our discussion on Sept 17th: my pseudotime distributions seem qualitatively pretty similar across samples and conditions, with few ties -- so seems likely that there are multiple possible "failure modes" for lamian_test() / the cholesky decomposition.

My nuclei numbers (10X Multiome): 2,000 - 4,000 / sample, 2 samples per condition so maybe more balanced than the situation that Kyla discusses.

I haven't had the chance yet to read the final Lamian publication (congrats!) to see how these counts compare to the demonstrated use cases.

Winnie09 commented 11 months ago

Hi, would it be helpful if you disfunction the knot selection? For example, set the number of knots as 3.

leehs96 commented 10 months ago

I had been through same issues such as 'chol2inv': the leading minor of order 1 is not positive definite Traceback: and Error in fit[[1]]$fitres.full : $ operator is invalid for atomic vectors

but those were disappeared after i filtered out samples with low cell number ( < 10). I think maybe it has sth to do with the cell number if it's too low to calculate some statistics.