Closed liu-xingliang closed 2 years ago
Hello Xingliang, thanks for the message.
I would say that these parameters depend more on the reference than on the query dataset. The ndim
parameter specifies the number of PCA components to use for calculating neighbors, and as a rule of thumb should reflect the complexity of the reference atlas (given that the subtypes in the reference are calculated using a limited number of PCA components). The k
parameter refers to the number of neighbors used for assigning a cell type, and within a reasonable range (5 to 50) does not appear to affect much the prediction.
Best, -m
Thank you, @mass-a, I got your point, I agree that the ndim
should depend on the complexity of reference dataset to provide enough "resolution" to project on.
Hi the team,
I've noticed the default
ndim
andk
is 10 and 20 separately forcellstate.predict
function. My concern is these parameters may not fit large query dataset. What is the recommended way to adjust those parameters, for example, would it be rationale to adoptndim
PCs based onSeurat::ElbowPlot
knee point (like Seurat did) on the "projected object" with integrated reference and query dataset:Interestingly, a large projected object with more than 90k cells over 699 integrating features showed knee point around 10 PCs in
Seurat::ElbowPlot
, that seems confirm the default parameter, :).bless~ Xingliang