LTLA / batchelor

Clone of the Bioconductor repository for the batchelor package.
https://bioconductor.org/packages/devel/bioc/html/batchelor.html
16 stars 7 forks source link

recuedMNN Error: C stack usage 7978148 is too close to the limit. #47

Open wanghao98 opened 9 months ago

wanghao98 commented 9 months ago

Hello, thank you for the amazing package! However we ran into some issue when I was running reducedMNN() with my own PCA embeddings. I hit an error saying ERROR: C stack usage 7978148 is too close to the limit. When we looked into the function we realized for reducedMNN, ultimately .create_tree_predefined is called and that calls .binarize_tree and .fill_tree. Both of those two functions are recursive. Any thought on fixing this problem? Thanks!

LTLA commented 9 months ago

Do you have a minimal reproducible example? Looking at the code, the recursion seems fine to me; I can't think of what you might provide in merge.order= that causes an infinite recursion.

wanghao98 commented 9 months ago

Hi, Thank you for the response. Here is the code I used: mnn_sample <- reducedMNN(pca_embedding_sample), here pca_embedding_sample is a list of 336 pca matrix. We didn't specify any other argument for that function. When I specify auto.merge= TRUE to skip the recursive code, it didn't hit the error but has been running for several days. Hope the info might be helpful. Thanks again for the help!

LTLA commented 9 months ago

I have no idea. You're going to have to provide a MRE, even if it is just with simulated data.

wanghao98 commented 8 months ago

Hi, sorry for the late reply. Here is the link to the data we used. We saved it into a Rda file. https://drive.google.com/file/d/1m6W4lPsMc5ehkwGIsm93F2SGws7PPm9L/view?usp=sharing After you load the data, you could run mnn_sample <- reducedMNN(pca_embedding_sample) This is where we hit the error Error: C stack usage 7973380 is too close to the limit Thank you for looking into that.

wanghao98 commented 7 months ago

Hello, just want to follow up on this. Thanks!

LTLA commented 7 months ago

Will try to get to it tomorrow, but I'm afraid I can't promise anything...

LTLA commented 7 months ago

Huh, works fine for me:

X <- reducedMNN(pca_embedding_sample)
X
## DataFrame with 153216 rows and 2 columns
##                                corrected                  batch
##                                 <matrix>            <character>
## 1229878 1.333930: 3.877292: 1.725218:... 0a1148dc-356f-4dc4-9..
## 645643  0.543186: 4.240856: 1.139626:... 0a1148dc-356f-4dc4-9..
## 558085  2.683715: 1.127521: 0.915093:... 0a1148dc-356f-4dc4-9..
## 828841  3.399150:-7.506363:-0.339844:... 0a1148dc-356f-4dc4-9..
## 1136033 2.125245: 0.630467:-1.438122:... 0a1148dc-356f-4dc4-9..
## ...                                  ...                    ...
## 1022646  -0.272424:2.74660:-4.610657:... f09bc3b1-4818-4240-b..
## 935845    2.619467:1.25032:-0.804194:... f09bc3b1-4818-4240-b..
## 885645   -0.134671:3.30904:-2.640632:... f09bc3b1-4818-4240-b..
## 544186    2.254283:3.16733: 1.023467:... f09bc3b1-4818-4240-b..
## 1136882  -0.145355:3.31120:-3.357116:... f09bc3b1-4818-4240-b..

A bit slower than I'd like; the underlying algorithm is an old version that is quadratic with respect to the number of batches (cubic, for auto-merging!) and I haven't had the chance/funding to update it to the latest version here.