SONGDONGYUAN1994 / scDesign3

scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
https://songdongyuan1994.github.io/scDesign3/docs/index.html
MIT License
86 stars 24 forks source link

Simulate spatial dataset with several batches #59

Open VivLon opened 2 months ago

VivLon commented 2 months ago

Hi,

Thank you for this package. I wonder what's a good way of simulating spatial data with several batches. I have another spatial data with 12 batches for the simulation, and I'm currently thinking of three ways to simulate: (1) Simulate batch effect and spatial data simultaneously by setting mu_formula="s(row, col, bs = 'gp', k=400) + batch + cell_type" (2) Simulate spatial data and then add batch effect using the step-by-step functions you recommended in #8 (3) Independently simulate spatial data for each batch of the data I have, and then concat them together. Any recommendations?

Thanks, Wenxin

SONGDONGYUAN1994 commented 2 months ago

Hi Wenxin,

(2) sounds like a good idea to me.

(1) may not work: your spatial locations between different datasets may not be under the same coordinate systems. If they are, then this is a good idea, but I am afraid that it is almost impossible in real-world data.

(3) actually works. The tricky part is that then each batch will be completely independent of the other. Therefore, you need to think about the point of your simulation since you probably want some shared patterns between those batches.

Best regards, Dongyuan

VivLon commented 2 months ago

Hi Dongyuan,

Thank you for your recommendation! I tried the step-by-step method, but I received an error: Error in names(answer) <- dots[[1L]] : 'names' attribute [2000] must be the same length as the vector [1333] Calls: fit_marginal ... suppressMessages -> withCallingHandlers -> paraFunc Do you know what's going on?

Another problem is I also got In addition: Warning message: In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, : scheduled core 2 did not deliver a result, all values of the job will be affected I guess it's due to I set n_cores=3 in fit_marginal() and fit_copula(), but set n_cores=1 in extract_para() as in the tutorial. Do I need to set n_cores as the same in all functions?

Finally, I got an 'OOM Killed' message, even though I submitted a 300G job to run the steps from construct_data to extract_para. The dataset I use for simulation has ~2600 cells and 2000 genes. Any suggestions?

Thanks, Wenxin

SONGDONGYUAN1994 commented 2 months ago

Hi Wenxin,

Sorry for the late reply. The error seems new to me. It is likely due to some ROM issue, but I am not very sure with limited information.

Could you please subset your dataset (e.g., subsetting only 10 genes)? You can also make the marginal model simpler: e.g., change k = 400 to k = 50.

Best regards, Dongyuan