Batch integration / Sample normalisation

This is a great question, but unfortunately something we only partly explored and didn't make it into our final publication simply for reasons of space.

Accounting for Batches

In the scheme of testing 3'UTR changes, comparisons will ultimately be made within genes (e.g., a Weighted Usage Index) rather than across genes (e.g., a TPM). Log-scaling is not needed for that. The scUTRboot testing framework was really developed for the small datasets (e.g., 1-3 samples per tissue) that we were working with at the time. For larger datasets where there are several samples per condition, I would recommend using something like DRIMseq to perform the statistical testing, where one would psuedobulk to cell types and only use library sizes (of the pseudobulk). That would be analogous to the common recommendation to use DESeq2 or limma for gene expression on pseudobulk. I believe one can similarly include batches there as covariates if needed, but otherwise the presence of multiple samples should serve as statistical source of variance. That is, if you don't attempt correcting for batch, the batches will contribute variance to the conditions and the statistic will consider that.

Since everything in APA testing is about proportions of reads, batch effects that impact gene expression levels would not be expected to be so problematic. Also, if one is first using gene expression (and/or chromatin accessibility) in a batch-integrated space to derive cell-type annotations, then subsequently using those annotations on the uncorrected 3'UTR counts would be implicitly feeding that equating across batches back into the model.

What is the Batch Effect in 3'UTR Counting?

Anecdotally, what I've seen as the primary effect of "batch" in 3'UTR counting comes in the form of varying rates of internal priming, which can occasionally leak into shorter 3'UTR isoforms when there are A-rich regions slightly downstream of true cleavage sites. For this reason, I think a proper solution to accounting for batch in this space would be to compute for each batch an internal priming rate and use that number as a covariate in all dWUI or similar APA tests. However, this would require an additional layer of possibly curated counting of reads specifically in what we classify as internal priming peaks, something we just don't have at this point. I'd speculate that fraction of intronic reads might be a first-order approximation to this, but the technical work on this simply hasn't been done.

Testing for Batch Effects

For now, I can at least point you to some of the data that we had in the original preprint, where we ran pairwise tests within each cell type across batches, that had shown that there was minimal significant batch effects. While the plots here filtering for only a few genes, these were indeed the only was that showed as near-significant in the inter-batch testing.

Text from Preprint

"Among the biological replicates shown in Supplementary Fig. 4a and 4b, we only detected a significant difference in LUI for Lmo4 expressed in HSC (5% FDR). In contrast, scUTRboot identified significant differences in LUI between several cell types, especially in later stages of erythroblast differentiation (Fig. 4a, 4b)"

Select Statistical Tests https://htmlpreview.github.io/?https://github.com/Mayrlab/scUTRquant-figures/blob/knitted/figures/figure4/fig4ab_ery_batches_tests.html

Plots of Bootstrapped LUI per Batch-Celltype https://htmlpreview.github.io/?https://github.com/Mayrlab/scUTRquant-figures/blob/knitted/figures/figure4/fig4ab_ery_bs_batches.html

At some point I had run statistical testing across batches for all of Tabula Muris. We had found that this was minimal, especially when controlling for multiple hypothesis testing. However, I would have to dig through my archived data to find this.

If you are interested in that, I can perhaps track that down.

Mayrlab / scUTRquant