snRNA-seq re-processing: clustering

After #3, we'll continue with tSNE / UMAP and clustering. So this will involve creating code/09_snRNA-seq_re-processed/03_clustering.R.

Briefly, Erik computed tSNE / UMAP and used the PCs he got from the poisson pearson residuals then ran 4 graph-based clustering options: with k nearest neighbors 5, 10, 20 and 50. Then he plotted a few marker genes and chose k = 20 to continue.

Lines 286 to 337 are the ones for plotting the marker genes
Later Erik has other sets of genes in lines 379 to 383 and visualizes them in lines 385 to 403.

Matt and Louise:

Determined the optimal number of PCs to use https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/R-batchJob_DLPFC-n3_optimalPCselxn_LAH2021.R (there's a companion shell script)
Ran tSNE / UMAP on that set of PCs https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L124-L136
Used k = 20 for building their shared-nearest neighbor graph https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L139-L149 to generate their prelimClusters. They then check them https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L151-L179.
Then they use hierarchical clustering to group those prelimClusters into collapsedClusters https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L186-L284.
Once they have the collapsedClusters, they visualize a few marker genes in order to label them https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L288-L300.

I think that we should use the strategy Matt and Louise used and go with k = 20 from the beginning. Let's see what the marker gene plots look like for the set of:

Erik's markers for Habenula
Erik's sets of nicotine, opiod, alcohol and cocaine genes
We could also use the DLPFC marker genes Matt & Louise used, just out of curiosity

We could also compare the resulting prelimCluster and collapsedCluster labels with the clusters Erik had created. This can be done with addmargins(table()) for example or with a heatmap.

We'll examine these plots with everyone involved.

Our resulting object from this script should have the prelimCluster and collapsedCluster labels. We'll then make a new one that will add the cellType and cellType.broad columns based on what we decide with everyone on how to label each collapsedCluster (or similar spelling: use the ones Matt & Louise have in the final published objects, not the intermediate column names; Louise can tell you which ones they are) .

LieberInstitute / Habenula_Pilot

snRNA-seq re-processing: clustering #4