Closed mugpeng closed 2 years ago
By the way, is there any parameters that I can set to tell splatPopEstimate
where to read corresponding data like condition info stores in coldata as a column, but how can I set it? Or change the colnames?
Hello @mugpeng Thanks for your interest in using splatter.
Regarding your question about speeding up the parameter estimation step: There is not a multithreading function at this time. However, because the purpose of this step is to estimate distribution parameters from empirical data, you can randomly downsample the number of empirical cells you are providing to make this step faster.
Regarding the question about the degree of variation between groups and the number of cells assigned to each group: You can adjust the relative impact of condition and group to your liking using cde.facLoc/cde.facScale
(see) and de.facLoc/de.facScale
(see), respectively. You can also change the proportion of cells being assigned to each group using the group.prob
parameter.
Regarding the question about telling splatPopEstimate where to look for things like group and condition in the provided single cell data: We actually recommend running splatPopEstimate on a subset of your empirical data that includes only cells from one individual, from one group (i.e. cell-type), and from one condition. This is because the single-cell parameters you are estimating in this step are used to define the homogenous population of cells from one individual, from one group, from one condition, so you don't want additional sources of variation being modeled at that stage!
Let us know if you have other questions!
thanks!
I am going to close this now but please comment if you have further questions.
Hi,
It's really helpful me by using splatter to generate some simulation data. But I still have some problems.
The method seems to have no capability to use multithreading like function
splatPopEstimate
. So it's quite slow when estimate parameters from a big real data.By the way, the reason why I am trying to estimate from real data is because the simulation results are quite weird and unexpected when consider both group(cell type), sample(different conditions):
From my point of view, I think it's not "real" when
group.prob
is equal in each sample:Besides, the difference between same groups(cell type) should not too great to separate into multiple cluster.
All the scenarios really puzzled me. Thanks. :)