LieberInstitute / Habenula_Pilot

habenulaPilot project code repository
0 stars 0 forks source link

snRNA-seq re-processing: QC #2

Closed lcolladotor closed 2 years ago

lcolladotor commented 2 years ago

Based from our late 2021 meetings with Erik, we decided to re-process the snRNA-seq data. See https://jhu-genomics.slack.com/archives/G019V8X9CVC/p1639688649081100 for the messages from back then on Slack.

Add new QC code to GitHub

This uses 20210525_human_hb_processing.rda which was created with lines 35 to 125 from 20210323_human_hb_neun.R.

At the end of this, we have in our SCE object the columns:

Continue processing

This involves adapting code Erik wrote at 20210323_human_hb_neun.R.

You could run this interactively, or use sgejobs::job_single() to create a companion shell script to run the QC code.

This should mark the end of the QC steps.

lcolladotor commented 2 years ago

Note that 20210323_human_hb_neun.R lines 97 to 121 run emptyDrops() across all the samples together. Something Matt has found might not work as well in some cases and why at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L99 he runs it one sample at a time. Matt has convinced us to do this for the deconvolution snRNA-seq data with Louise. So well, we could potentially decide to change this, which means reading the data from processed-data/07_cellranger and starting the R objects from scratch instead of using 20210525_human_hb_processing.rda.

lcolladotor commented 2 years ago

We should add to the QC script the calculation of the doubletScore using scDblFinder. That's from Matt's code at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L354-L444. Matt computes this one sample at a time, which is what we should do here. Since we'll have a single large SCE object, we can use a loop and subset it to each sample. Something like:

sce$new_variable <- NULL
for(i in unique(sce$sample_id)) {
    sce_sub <- sce[, sce$sample_id == i]
    ## compute something
    results ## a vector of results
    sce$new_variable[sce$sample_id == i] <- results
}

Something like the above would be useful for running emptyDrops() one sample at a time. Louise @lahuuki might need to this also on the deconvolution snRNA-seq (I haven't created those issues yet!).

lcolladotor commented 2 years ago

Make sure this initial SCE object includes the sample ids, sex, region (well, it's all Habenula), diagnosis, age info. Kind of like https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/10x_all-FACS-n10_2021rev_step01_processing-QC_MNT.R#L463-L472 or like https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/build_basic_spe.R#L32-L45 that gets added at https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/build_basic_spe.R#L129-L140.