HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
165 stars 33 forks source link

Doubt in terminology #123

Closed cakeinspace closed 1 year ago

cakeinspace commented 1 year ago

Hey thanks for the wonderful code and paper.

in the muscat paper and code when we say sample do we mean different batches of the same underlying starting material. So when we simulate for 2 samples and 4 clusters. we get two different batches of PBMCs which contain 4 clusters each so basically data with batch effects. or do we get PBMC-one sample and lets say BM another sample with 4 clusters each

cakeinspace commented 1 year ago

library(muscat)
data(example_sce)
sce_preped <- prepSim(example_sce, verbose = TRUE)

I am using this code example to generate the simulated data. What is the underlying true dataset used. When i use this code snippet. I see that the SCE object contains two samples labeled as ctrl does that mean that the data was simulated using the control and hence these are two technical replicates of the same starting material.

When I plot the PCA of the ground truth log transcription quotients of each cluster. I see that some of them overlap is there any explanation for this or is this an expected behavior of the simulation method. For a simulation with two samples and 4 clusters there are just 8 unique ltqs.

HelenaLC commented 1 year ago

I'm not sure I understand the "issue" completely, but will try to provide an answer as follows: