Vivianstats / scINSIGHT

Matrix factorization model for interpreting single cell gene expression in biologically heterogeneous data
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02649-3
19 stars 2 forks source link

Error in run_scINSIGHT #8

Open jeprob opened 1 year ago

jeprob commented 1 year ago

Hi everyone, I encounter an error when I run the run_scINSIGHT() method with run_scINSIGHT(scINSIGHT_object, K=5, K_j = 2, out.dir = intermediate_path_scINSIGHT, num.cores = 16) .

I followed the vignette to create the scINSIGHT object and it looks good so far, everything until the run_scINSIGHT ran without errors (I have dataset with 20 samples (either treatment / control set as condition)).

when I select 16 cores (which should be available I get the following error: unnamed

when I select 1 core as a check, I get the following error: unnamed

Have you seen this before?

This is my session info: unnamed

Thanks a lot in advance, Jenni

sldyns commented 1 year ago

Hello Jenni,

Sorry for my negligence, I didn't consider the case where K is an integer before.

The default input K is an array, please give a set of candidate K as input, such as K=seq(5,11,2).

Hope this solves your problem.

jeprob commented 1 year ago

Hi, this alongside with some environment issues solved my problem! (:

The sequence is needed because the optimal k value is chosen to be the middle one among the ones leading to the highest stability scores? Can you give a little more reasoning why the middle value is chosen and not the one with the highest stability score?

Thanks in advance! Jenni

aboutfanfan commented 6 months ago

I have the same problem, do you know why now? @jeprob Thanks!

sldyns commented 6 months ago

Hello jeprob and aboutfanfan!

In our previous experiments, we observed a phenomenon: the highest stability does not always directly lead to the best downstream analysis results. Upon further analysis, we found that selecting the median of the top several k values with the highest stability can provide us with more robust results.

The logic behind this finding is that by avoiding the selection of excessively large or small k values, we can effectively prevent potential overfitting or underfitting issues.

Hope this can solve your problem!