JSB-UCLA / scDesign2

An interpretable simulator that generates realistic single-cell gene expression count data with gene correlations recapitulated
MIT License
24 stars 4 forks source link

Question about dropout setting #5

Closed inoue0426 closed 1 year ago

inoue0426 commented 1 year ago

Hi,

I have questions about the dropout simulation. I'm working on single-cell RNA imputation and for that, I want to create simulation data without dropout as ground truth and with dropout for training data.

I've tried to set set.seed(1), sim_method = 'copula', zp_cutoff=0 and marginal = 'zinb' to reproduce it for both situation. However, I got completely different results. Can we make the simulation without dropout and with dropout?

Best, Yoshi

sunty17 commented 1 year ago

@inoue0426

Hi Yoshi,

If I understand correctly, for your task, in the model fitting part, you could set marginal = 'zinb'. Then to simulate data with dropout, you could directly use the result returned by fit_model_scDesign2(). To simulate data without dropout, you could modify the first columns of the fitted marginal parameter matrix marginal_param1 and marginal_param2, which represent the zero-inflation proportion in the zinb models, to be all zeros.

For zp_cutoff, I suggest you keep it to be the default value. If you set it to 0, then all the genes will be included for copula fitting, which (1) is probably not necessary, (2) may generate some errors, and (3) will take longer for the model fitting.

For reproducibility, if you set the same seed value, but still got different results, it may be due to the parallel function mclapply() in R. You could try running RNGkind("L'Ecuyer-CMRG") before setting the seed and running the other code.

Hope this is helpful to you!

Best, Tianyi

inoue0426 commented 1 year ago

@sunty17 Thank you I'll try it!