big data set and K - Githubissues

yuGithuuub commented 4 years ago

Hey ALRA team, I would like to ask alra's performance on very large data sets（～600k cell) I am using scapy pipeline and I have 2 quertions:

I noticed that the excessively large value of k in your article seems to have little effect on the results. Is it appropriate to use the default parameter of k = 50？ 2.I found that after subsetting the data, I found that it seemed to perform better.Is this related to the k value? By the way , alra provides the best experience in certain aspects !^_^ Looking forward to your reply

JunZhao1990 commented 3 years ago

Thanks for your interest in ALRA! And sorry for the very late response. To better understand your question, could you provide the estimated k values by ALRA for the whole data and the subset data? You could run the choose_k() function in the ALRA code to find the estimated k.

ghost commented 8 months ago

@yuGithuuub Hello, ANA111. I, too, work with large datasets in my analyses. I've encountered an issue related to sparseMatrix. Have you faced a similar challenge by any chance?

[Error occurred] Error in .m2sparse(from, paste0(kind, "g", repr), NULL, NULL): attempt to construct sparseMatrix with more than 2^31-1 nonzero entries

KlugerLab / ALRA

big data set and K #8