Normalization: Log or SCT or both?
Log for findMarkers?
SCT create a new slot with the data
SCT for clusters
Log for looking at expression
SCT new version it does: NormalizeData(), ScaleData(), and FindVariableFeatures().
The glmGamPoi package substantially improves speed, but memory is high, turn off if it is an issue
Check if new version of SCT will actually run normalizeData and check numbers are different (checking the counts and data in the RNA assay)
Regress covariates? When they are not relevant to the question and affecting clustering
Looking at PCA for variables to possible regress if needed
Looking at UMAP for variables to regress
Try not to regress covariates by default
Integration
When to do it: multiple samples or batches, you need to make sure clusters are based on cell type/stages not other variable
What to integrate on: sample, batches, any other variable that is cofounding the clustering step
Always use Harmony?
It could be Harmony only with one
Harmony for two or more covariates, CCA for one
Log-normalization and look at metadata in UMAP, and if there is clear separation, try SCT, if mito/ribo are driving differences, remove the genes from VariableGenes function
If the biological variable separate too much the clusters, still may be useful to force cells to be the same among conditions
Look at samples indv. And annotate clusters first, then put together samples to see how the prior clusters align and decide based on that
Clustering
How to display data and choose resolution?
Leiden clustering
Clustering tree
Upmap for each resolution:
Broader : 0.1
Granular: 0.8
Follow up clustering over resolutions
Plotting markers on UMAP
Barplots showing proportions of metadata variable per cluster
Who IDs clusters (the client!)
Guide with known methods: celltypist, ?
Sub-cluster to a specific cell type to identify the sub-clusters
Annotate with High/Low genes
FindMarkers vs. FindConservedMarkers vs. FindAllMarkers
Normalization: Log or SCT or both? Log for findMarkers? SCT create a new slot with the data SCT for clusters Log for looking at expression SCT new version it does: NormalizeData(), ScaleData(), and FindVariableFeatures(). The glmGamPoi package substantially improves speed, but memory is high, turn off if it is an issue Check if new version of SCT will actually run normalizeData and check numbers are different (checking the counts and data in the RNA assay) Regress covariates? When they are not relevant to the question and affecting clustering Looking at PCA for variables to possible regress if needed Looking at UMAP for variables to regress Try not to regress covariates by default Integration When to do it: multiple samples or batches, you need to make sure clusters are based on cell type/stages not other variable What to integrate on: sample, batches, any other variable that is cofounding the clustering step Always use Harmony? It could be Harmony only with one Harmony for two or more covariates, CCA for one Log-normalization and look at metadata in UMAP, and if there is clear separation, try SCT, if mito/ribo are driving differences, remove the genes from VariableGenes function If the biological variable separate too much the clusters, still may be useful to force cells to be the same among conditions Look at samples indv. And annotate clusters first, then put together samples to see how the prior clusters align and decide based on that Clustering How to display data and choose resolution? Leiden clustering Clustering tree Upmap for each resolution: Broader : 0.1 Granular: 0.8 Follow up clustering over resolutions Plotting markers on UMAP Barplots showing proportions of metadata variable per cluster Who IDs clusters (the client!) Guide with known methods: celltypist, ? Sub-cluster to a specific cell type to identify the sub-clusters Annotate with High/Low genes FindMarkers vs. FindConservedMarkers vs. FindAllMarkers