YosefLab / SymSim

24 stars 8 forks source link

How to specify different cell sizes for each population #26

Closed javier-marchena-hurtado closed 1 year ago

javier-marchena-hurtado commented 1 year ago

Hello,

I would like to simulate scRNA-seq data with 5 populations, and I would like to specify the cell size of each of these populations. In other words, I would like to specify that certain populations should have more total RNA counts than other populations. How can I do that?

The only way that I found for now is by using the scale_s parameter. But I'm not sure whether I can pass a vector to the scale_s parameter indicating the cell size of each population.

The code I am using now is:

ngenes = 10000 ncells = 500 phyla = Phyla5() true_counts_rna = SimulateTrueCounts(ncells_total=ncells, ngenes=ngenes, evf_type="discrete", Sigma=0.5, randseed=0, phyla=phyla, min_popsize=50, nevf=10, n_de_evf=9, vary="s", scale_s = c(0.1, 0.3, 0.5, 0.7, 0.9))

This is what the tSNE looks like: image

And this is the violin plot of total RNA counts grouped by population:

image

At least the populations do have different total counts. But the total counts are not gradually increasing from population 1 to population 5, the way I intended when I passed scale_s = c(0.1, 0.3, 0.5, 0.7, 0.9).

Is there any way to better specify the cell size of each population? Thanks in advance.

xiuweiz commented 1 year ago

Hi @javiermarchena Thank you for using SymSim. You are right that scale_s is the parameter to set in your case. However, currently, scale_s is a scalar and if you pass it a vector, most likely it is the first value that is used. We can change the code so that when scale_s is a vector it is used for different populations and we are adding this in our new simulator, scMultiSim, which simulates single cell multi-omics data (you can certainly simulate only scRNA-seq data). You can find scMultiSim here https://github.com/ZhangLabGT/scMultiSim.

javier-marchena-hurtado commented 1 year ago

Great, thanks a lot!