This repository contains the code for the paper Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours by Haghverdi et al. (2018).
Note: Further updates and development of the analysis and simulation code will take place at https://github.com/MarioniLab/FurtherMNN2018. If you have general questions regarding the code (i.e., not specifically involving the manuscript), please post your issues at the above repository instead.
Simulations
directory.
First run the source file simulateBatches.R
, then run the source file plotCorrections.R
.Haematopoiesis
directory.
First run the source file prepareData.R
, then the plotCorrections.R
script.DownloadData.sh
.pancreas
directory and execute the script normalizePancreas.R
.findHighlyVariableGenes.R
assignCellTypeLabels.R
.PancreasProcessingCorrection.R
in the Pancreas folder first.
You will need to create a directory called 'results', into which all figures and batch corrected data will be saved.PancreasCorrectionComparison.R
in the Pancreas folder.local_global_batchvect.R
in the Pancrease folder.PancreasDE_analysis.Rmd
, contained in the PancreasDE
directory.Droplet/
, specifically pbmc_normalisation.R
for the 68,000 PBMCs and tcell_4K_normalisation.R
for the 4,000 T cells.
Please note that trying to normalise 68,000 cells on your local machine will require a lot of resources (memory and CPU), it is recommended that the scripts in the Droplet/
are executed on an appropriate high performance computing cluster.
The scripts to perform tSNE and cluster assignment using community detection on the uncorrected data can be performed by running uncorrected_68k_tSNE.R
, assign_cell_types_68kPBMC.R
.
To perform the equivalent tasks to generate the panels of Figure 5, run combine_10X.R
, pbmc68k_tSNE.R
, PBMC_68k_plotting.R
, assign_cell_types_68kPBMC_corrected.R
and Corrected_PBMC_68K_assignCellLabels.R
.