berenslab / umi-normalization

Companion repository to Lause, Berens & Kobak (2021): "Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data", Genome Biology
https://doi.org/10.1186/s13059-021-02451-7
GNU Affero General Public License v3.0
41 stars 2 forks source link
dimensionality-reduction gene-selection glm-pca negative-binomial negative-binomial-model negative-binomial-regression normalization scrna single-cell-analysis single-cell-rna-seq t-sne umi-count umi-count-matrix

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Jan Lause, Philipp Berens & Dmitry Kobak

How to use this repository

Version 3.0 of this repository contains the code to reproduce the analysis presented in our Genome Biology paper on UMI data normalization (Lause, Berens & Kobak, 2021) and the corresponding preprint (v3). The code used for versions v1 and v2 of the paper is available under the tags 1.0 and 2.0 in this repository.

To start, follow these steps:

Then, you can step through our full analysis by simply following the sequence of the notebooks. If you want to reproduce only parts of our analysis, there are six independent analysis pipelines that you can run individually:

Note that 041 and 101 are R notebooks, the remaining are Python notebooks.

Each of the analyses will first preprocess and filter the datasets. Next, computationally expensive tasks are done (NB regression fits, GLM-PCA, t-SNE, simulations of negative control data, ..) and the results are saved as files. For some analyses, this is done in separate notebooks. Finally, the results files are loaded for plotting (again in separate notebooks for some analyses).

We recommend to run the code on a powerful machine with at least 250 GB RAM.

For questions or feedback, feel free to use the issue system or email us.

Pre-requisites

We used the following software environments:

Python
R

The full R environment used was

attached base packages:
parallel  stats4    stats     graphics  grDevices utils     datasets     methods   base     

other attached packages:
MASS_7.3-53.1               sctransform_0.3.2          SingleCellExperiment_1.8.0  SummarizedExperiment_1.16.1   
DelayedArray_0.12.3         BiocParallel_1.20.1        matrixStats_0.58.0          Biobase_2.46.0             
GenomicRanges_1.38.0        GenomeInfoDb_1.22.1        IRanges_2.20.2              S4Vectors_0.24.4           
BiocGenerics_0.32.0         glmpca_0.2.0               

loaded via a namespace (and not attached):
tidyselect_1.1.0       listenv_0.8.0          purrr_0.3.4           reshape2_1.4.4         lattice_0.20-41        colorspace_2.0-0      
vctrs_0.3.7            generics_0.1.0         utf8_1.2.1            rlang_0.4.10           pillar_1.6.0           glue_1.4.2            
DBI_1.1.1              GenomeInfoDbData_1.2.2 lifecycle_1.0.0       plyr_1.8.6             stringr_1.4.0          zlibbioc_1.32.0       
munsell_0.5.0          gtable_0.3.0           future_1.21.0         codetools_0.2-18       fansi_0.4.2            Rcpp_1.0.6            
scales_1.1.1           XVector_0.26.0         parallelly_1.24.0     gridExtra_2.3          ggplot2_3.3.3          digest_0.6.27         
stringi_1.5.3          dplyr_1.0.5            grid_3.6.3            tools_3.6.3            bitops_1.0-6           magrittr_2.0.1        
RCurl_1.98-1.3         tibble_3.1.1           crayon_1.4.1          future.apply_1.7.0     pkgconfig_2.0.3        ellipsis_0.3.1        
Matrix_1.3-2           assertthat_0.2.1       R6_2.5.0              globals_0.14.0         compiler_3.6.3  

Download instructions for presented datasets

All accession numbers can also be found in Table S2 of our paper.

33k PBMC dataset
Counts & Annotations
10x control / Svensson et al. 2017
inDrop control / Klein et al. 2015
MicrowellSeq control / Han et al. 2018
Retina: All cell classes/ Macosko et al. 2015
Counts
Retina: Bipolar cells / Shekhar et al. 2016
Counts
Retina: Ganglion cells / Tran et al. 2019
Raw counts
Annotations and original gene selection
2-million cells: Mouse Organogenesis / Cao et al. 2019
Raw counts and annotations
FACS-sorted PBMC cells / Zheng et al. (2017) and Duò et al (2018)