Open MathieuBo opened 6 days ago
This is a great question, but unfortunately something we only partly explored and didn't make it into our final publication simply for reasons of space.
In the scheme of testing 3'UTR changes, comparisons will ultimately be made within genes (e.g., a Weighted Usage Index) rather than across genes (e.g., a TPM). Log-scaling is not needed for that. The scUTRboot
testing framework was really developed for the small datasets (e.g., 1-3 samples per tissue) that we were working with at the time. For larger datasets where there are several samples per condition, I would recommend using something like DRIMseq to perform the statistical testing, where one would psuedobulk to cell types and only use library sizes (of the pseudobulk). That would be analogous to the common recommendation to use DESeq2 or limma for gene expression on pseudobulk. I believe one can similarly include batches there as covariates if needed, but otherwise the presence of multiple samples should serve as statistical source of variance. That is, if you don't attempt correcting for batch, the batches will contribute variance to the conditions and the statistic will consider that.
Since everything in APA testing is about proportions of reads, batch effects that impact gene expression levels would not be expected to be so problematic. Also, if one is first using gene expression (and/or chromatin accessibility) in a batch-integrated space to derive cell-type annotations, then subsequently using those annotations on the uncorrected 3'UTR counts would be implicitly feeding that equating across batches back into the model.
Anecdotally, what I've seen as the primary effect of "batch" in 3'UTR counting comes in the form of varying rates of internal priming, which can occasionally leak into shorter 3'UTR isoforms when there are A-rich regions slightly downstream of true cleavage sites. For this reason, I think a proper solution to accounting for batch in this space would be to compute for each batch an internal priming rate and use that number as a covariate in all dWUI or similar APA tests. However, this would require an additional layer of possibly curated counting of reads specifically in what we classify as internal priming peaks, something we just don't have at this point. I'd speculate that fraction of intronic reads might be a first-order approximation to this, but the technical work on this simply hasn't been done.
For now, I can at least point you to some of the data that we had in the original preprint, where we ran pairwise tests within each cell type across batches, that had shown that there was minimal significant batch effects. While the plots here filtering for only a few genes, these were indeed the only was that showed as near-significant in the inter-batch testing.
Text from Preprint
"Among the biological replicates shown in Supplementary Fig. 4a and 4b, we only detected a significant difference in LUI for Lmo4 expressed in HSC (5% FDR). In contrast, scUTRboot identified significant differences in LUI between several cell types, especially in later stages of erythroblast differentiation (Fig. 4a, 4b)"
Select Statistical Tests https://htmlpreview.github.io/?https://github.com/Mayrlab/scUTRquant-figures/blob/knitted/figures/figure4/fig4ab_ery_batches_tests.html
Plots of Bootstrapped LUI per Batch-Celltype https://htmlpreview.github.io/?https://github.com/Mayrlab/scUTRquant-figures/blob/knitted/figures/figure4/fig4ab_ery_bs_batches.html
At some point I had run statistical testing across batches for all of Tabula Muris. We had found that this was minimal, especially when controlling for multiple hypothesis testing. However, I would have to dig through my archived data to find this.
If you are interested in that, I can perhaps track that down.
Hi!
Have you explored strategies for batch integration for larger datasets or sample normalisation in addition of the scaling/log normalisation?
Any advice?
Thanks!