ZhangLabGT / scMultiSim

A simulator for single cell multi-omics and spatial omics data that provides ground truth to benchmark a wide range of methods.
https://zhanglabgt.github.io/scMultiSim/
22 stars 5 forks source link

Can we use real data as base to simualte datasets similar to the real dataset? #6

Closed HelloWorldLTY closed 9 months ago

HelloWorldLTY commented 10 months ago

Hi, I notice that scMultiSim is based on tree or GRN. Other methods like scDesign3 is based on modeling real dataset and will be good for evaluation. Therefore, I wonder if your tool also has the ability to simulate datasets based on real datasets ( or parameters from a real dataset with same gene settings)? Thanks.

lhc70000 commented 10 months ago

Hi, Thanks for the question. I want to emphasize that scMultiSim is a de novo simulator, while scDesign3 is a reference-based simulator, and these two types of tools have fundamental differences. scMultiSim focuses on providing ground truth that is not obtainable in real data, therefore, it generates new datasets by design rather than replicating the distribution of an existing dataset.

That said, you can still manually adjust the parameters (mainly technical noise) to make the output data's distribution match a real dataset, as we have done in our manuscript: https://github.com/ZhangLabGT/scMultiSim_manuscript/blob/main/datasets/Auxiliary/fit_real/fit_10x.R (This will require some effort, as you may need to repeatedly run the simulation, check if the two distributions are close and adjust the parameters accordingly).

HelloWorldLTY commented 10 months ago

Got it, thanks a lot.