Winnie09 / Lamian

39 stars 9 forks source link

Expected runtimes for XDE/TDE test? #28

Open DanielYuhangLi opened 4 months ago

DanielYuhangLi commented 4 months ago

Hi Dr. Hou, it's great to have an alternative to try in relation to tradeseq - I thought the paper was clear and an interesting read. I was wondering what expected runtimes may be for the lamian_test function for both XDE and TDE. I have been trying in various iterations with a dataset consisting of 42k cells and one single pseudotime lineage extracted from slingshot. So far even with just 5 permutations and 5 cores (although the 5 cores don't seem to be actively used but I don't see any clear issues with the mcapply function from the code) with the TDE test, I have not been able to finish a run after 12+ hours. I've tried to also create an h5 file using the built in h5 function but run into similar issues, after overnight, the h5 file is still not written which is very odd. I was looking for some advice on what to look into regarding troubleshooting and let me know what information would help most in regards to troubleshooting.

For reference

perform TDE test

Res <- lamian_test( expr = seurat_object3@assays$RNA@data, cellanno = seurat_meta2, pseudotime = seurat_object3$pseudotime, design = design, test.type = 'time', testvar = 2, permuiter = 5, test.method = 'chisq',

This is for permutation test only.

We suggest that users use default permuiter = 100.

Alternatively, we can use test.method = 'chisq' to swich to the chi-square test.

ncores = 5 )

seurat_object3@assays$RNA@data is a 23815x42416 matrix seurat_object3$pseudotime is a vector of length 42416 seurat_meta2 is a metadata table that is 42416x2, columns are intercept (could you clarify what this refers to? I just set it as 1) and group (0,1)

Thanks!

Dan

ahnchi commented 4 months ago

I am having a similar issue when I try to run seurat object. I subsetted my object, so has ~6000 cells with ~50000 genes. I used ncores=10, but it's not being completed after many hours.

Winnie09 commented 3 months ago

Hi,

Thanks for your interest in this package! The time complexity depends heavily on the number of samples, which is fixed for a dataset. We can filter out some lowly expressed and lowly variable genes to reduce the running time.

Also, if you are using slingshot for the pseudotime inference, please refer to the manual section "Apply other pseudotime inference methods in Lamian: take slingshot as an example" on page https://winnie09.github.io/Wenpin_Hou/pages/Lamian.html#apply-other-pseudotime-inference-methods-in-lamian-take-slingshot-as-an-example. The following lines are important:

pt <- pt[!is.na(pt)] ## important
selectedCell <- names(pt)