sc normalization and clustering

Normalization: Log or SCT or both? Log for findMarkers? SCT create a new slot with the data SCT for clusters Log for looking at expression SCT new version it does: NormalizeData(), ScaleData(), and FindVariableFeatures(). The glmGamPoi package substantially improves speed, but memory is high, turn off if it is an issue Check if new version of SCT will actually run normalizeData and check numbers are different (checking the counts and data in the RNA assay) Regress covariates? When they are not relevant to the question and affecting clustering Looking at PCA for variables to possible regress if needed Looking at UMAP for variables to regress Try not to regress covariates by default Integration When to do it: multiple samples or batches, you need to make sure clusters are based on cell type/stages not other variable What to integrate on: sample, batches, any other variable that is cofounding the clustering step Always use Harmony? It could be Harmony only with one Harmony for two or more covariates, CCA for one Log-normalization and look at metadata in UMAP, and if there is clear separation, try SCT, if mito/ribo are driving differences, remove the genes from VariableGenes function If the biological variable separate too much the clusters, still may be useful to force cells to be the same among conditions Look at samples indv. And annotate clusters first, then put together samples to see how the prior clusters align and decide based on that Clustering How to display data and choose resolution? Leiden clustering Clustering tree Upmap for each resolution: Broader : 0.1 Granular: 0.8 Follow up clustering over resolutions Plotting markers on UMAP Barplots showing proportions of metadata variable per cluster Who IDs clusters (the client!) Guide with known methods: celltypist, ? Sub-cluster to a specific cell type to identify the sub-clusters Annotate with High/Low genes FindMarkers vs. FindConservedMarkers vs. FindAllMarkers

bcbio / bcbioR

sc normalization and clustering #48