snRNA-seq re-processing: QC

lcolladotor commented 2 years ago

Based from our late 2021 meetings with Erik, we decided to re-process the snRNA-seq data. See https://jhu-genomics.slack.com/archives/G019V8X9CVC/p1639688649081100 for the messages from back then on Slack.

Add new QC code to GitHub

[x] Create code/09_snRNA-seq_re-processed
[x] Adapt the R script from https://jhu-genomics.slack.com/archives/G019V8X9CVC/p1639688649081100 and call it code/09_snRNA-seq_re-processed/01_qc.R. Save the PDF in the corresponding plots/09_snRNA-seq_re-processed directory.

This uses 20210525_human_hb_processing.rda which was created with lines 35 to 125 from 20210323_human_hb_neun.R.

At the end of this, we have in our SCE object the columns:

discard_auto: these are the cells we'll discard moving forward (the ones with TRUE).
Columns with cells Erik had discarded and their initial cluster/cell type assignments. We want to keep this info in our SCE object so we can compare the results later.
Other pieces used in creating discard_auto.

Continue processing

This involves adapting code Erik wrote at 20210323_human_hb_neun.R.

[x] We already are using the output from lines 35 to 125 (aka 20210525_human_hb_processing.rda).
[x] We can skip lines 127 to 186 since those are where we'll differ from Erik. In those lines, he set his QC thresholds.
[x] We want to add the gene annotation information, so we'll need to adapt lines 188 to 225. It might be useful to also compare what he's doing vs what we do at https://github.com/LieberInstitute/spatialLIBD/blob/master/R/read10xVisiumWrapper.R#L86-L113. I think that we might want to adapt the code from spatialLIBD since that will give us the same gene info that we have in our Visium SPE objects.
[x] Save the resulting SCE object at this point, we can call it sce_post_qc.Rdata or something like that. Similar to his file at line 225.

You could run this interactively, or use sgejobs::job_single() to create a companion shell script to run the QC code.

This should mark the end of the QC steps.

lcolladotor commented 2 years ago

Note that 20210323_human_hb_neun.R lines 97 to 121 run emptyDrops() across all the samples together. Something Matt has found might not work as well in some cases and why at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L99 he runs it one sample at a time. Matt has convinced us to do this for the deconvolution snRNA-seq data with Louise. So well, we could potentially decide to change this, which means reading the data from processed-data/07_cellranger and starting the R objects from scratch instead of using 20210525_human_hb_processing.rda.

lcolladotor commented 2 years ago

We should add to the QC script the calculation of the doubletScore using scDblFinder. That's from Matt's code at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L354-L444. Matt computes this one sample at a time, which is what we should do here. Since we'll have a single large SCE object, we can use a loop and subset it to each sample. Something like:

sce$new_variable <- NULL
for(i in unique(sce$sample_id)) {
    sce_sub <- sce[, sce$sample_id == i]
    ## compute something
    results ## a vector of results
    sce$new_variable[sce$sample_id == i] <- results
}

Something like the above would be useful for running emptyDrops() one sample at a time. Louise @lahuuki might need to this also on the deconvolution snRNA-seq (I haven't created those issues yet!).

lcolladotor commented 2 years ago

Make sure this initial SCE object includes the sample ids, sex, region (well, it's all Habenula), diagnosis, age info. Kind of like https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L463-L472 or like https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/build_basic_spe.R#L32-L45 that gets added at https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/build_basic_spe.R#L129-L140.

LieberInstitute / Habenula_Pilot

snRNA-seq re-processing: QC #2

Add new QC code to GitHub

Continue processing