snRNA-seq re-processing: normalization

LieberInstitute / Habenula_Pilot

habenulaPilot project code repository

0 stars 0 forks source link

snRNA-seq re-processing: normalization #3

Closed lcolladotor closed 2 years ago

lcolladotor commented 2 years ago

We'll continue now with code/09_snRNA-seq_re-processed/02_normalization.R (and potentially a companion shell script generated with sgejobs::job_single() that can use the hold_jid to wait for the output of #2).

Erik's 20210323_human_hb_neun.R lines 228 to 263 do the normalization using poisson pearson residuals to compute the PCs. This is different from what Matt and Louise used at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L73-L104 where they use multiBatchNorm() across sample_id and then run fastMNN(). My understanding is that Erik's code doesn't batch-correct the data. With that in mind, in this R script let's adapt code from Matt and Louise. This would replace the PCs results Erik computes in 228 to 253 by the fastMNN-corrected PCs.

You might want to save the R object at this stage since computing tSNE / UMAP and doing the graph-based clustering can take a few hours to run.

lcolladotor commented 2 years ago

We decided yesterday to change a lot of this and use code from https://github.com/lmweber/locus-c/tree/main/code/analyses_sn that @mattntran wrote. We still need to update this issue.

lcolladotor commented 2 years ago

The short version is that we want to adapt the code from https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R instead of what we had discussed initially on this issue.

[x] Use nullResiduals() like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L69-L70
[x] Compute the GLMPCA_approx reduced dimensions like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L91-L94
[x] Then adjust for batch using reducedMNN() like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L106-L114
[x] Finally use multiBatchNorm() to get the logcounts like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L156

Overall, we are taking Matt's code and modularizing it. The next person will likely then adapt your code instead of going back to Matt's. But you could leave comments linking back to the specific sections of Matt's code that you edited.

lcolladotor commented 2 years ago

Let's also make plots of the regular tSNE/UMAPs before without batch correction and then make them again using the batch-corrected PCs. This would be equivalent to https://github.com/LieberInstitute/Visium_IF_AD/blob/master/plots/08_harmony_BayesSpace/wholegenome/UMAP_sample_id.pdf vs https://github.com/LieberInstitute/Visium_IF_AD/blob/master/plots/08_harmony_BayesSpace/wholegenome/UMAP_harmony_sample_id.pdf.