Closed lcolladotor closed 2 years ago
We decided yesterday to change a lot of this and use code from https://github.com/lmweber/locus-c/tree/main/code/analyses_sn that @mattntran wrote. We still need to update this issue.
The short version is that we want to adapt the code from https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R instead of what we had discussed initially on this issue.
nullResiduals()
like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L69-L70GLMPCA_approx
reduced dimensions like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L91-L94reducedMNN()
like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L106-L114multiBatchNorm()
to get the logcounts
like at https://github.com/lmweber/locus-c/blob/main/code/analyses_sn/03_reduceDims_clustering.R#L156Overall, we are taking Matt's code and modularizing it. The next person will likely then adapt your code instead of going back to Matt's. But you could leave comments linking back to the specific sections of Matt's code that you edited.
Let's also make plots of the regular tSNE/UMAPs before without batch correction and then make them again using the batch-corrected PCs. This would be equivalent to https://github.com/LieberInstitute/Visium_IF_AD/blob/master/plots/08_harmony_BayesSpace/wholegenome/UMAP_sample_id.pdf vs https://github.com/LieberInstitute/Visium_IF_AD/blob/master/plots/08_harmony_BayesSpace/wholegenome/UMAP_harmony_sample_id.pdf.
We'll continue now with
code/09_snRNA-seq_re-processed/02_normalization.R
(and potentially a companion shell script generated withsgejobs::job_single()
that can use thehold_jid
to wait for the output of #2).Erik's
20210323_human_hb_neun.R
lines 228 to 263 do the normalization using poisson pearson residuals to compute the PCs. This is different from what Matt and Louise used at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_DLPFC-n3_step02_clust-annot_LAH.R#L73-L104 where they usemultiBatchNorm()
acrosssample_id
and then runfastMNN()
. My understanding is that Erik's code doesn't batch-correct the data. With that in mind, in this R script let's adapt code from Matt and Louise. This would replace the PCs results Erik computes in 228 to 253 by the fastMNN-corrected PCs.You might want to save the R object at this stage since computing tSNE / UMAP and doing the graph-based clustering can take a few hours to run.