dbrg77 / plate_scATAC-seq

A rapid and robust plate-based single cell ATAC-seq (scATAC-seq) method
42 stars 14 forks source link

A rapid and robust plate-based single cell ATAC-seq (scATAC-seq) method

Note: this repository is here just for the sake of reproducing the analysis of our original Nature Communications publication. We now have a more updated version of the snakemake pipeline purely for the purpose of data processing. Please have a look at here: scATAC_snakemake. It has more flexible settings, and the output is mimicing the 10x Genomics output, which can be put directly into various downstream scATAC-seq data analysis packages.

Why bother? What's the point of this method?

Here, I quote from the Dutch computer scientist Edsger W. Dijkstra:

Simplicity is prerequisite for reliability.

Usage (two stages: data processing + data analysis)

1. Data processing (from fastq to data quality information and count matrix)

All steps are executed using the Snakefile in the corresponding directories. To start the workflow, download the fastq files from ArrayExpress (E-MTAB-6714), and put them in the fastq directories under mSp_scATAC-seq/rep{1..11}/fastq/ and under other_cells_methods/*/fastq. Then run the pipeline using snakemake with the Snakefile provided, but change the path of certain files/programs in the Snakefile such as picard.jar according to your own environment.

For the processing of the ImmGen bulk ATAC-seq and the public Fluidigm C1 scATAC-seq experiments, download the raw data (url and study accession number provided in the correponding directories) into the fastq directory and run snakemake in the same way. Or, you can use stream_ena to avoid downloading fastq files to save space.

The notebooks hek293t_nih3t3_mix_analysis.ipynb and technical_qc_and_methods_comprison.ipynb contain basic information from the species mixing experiment, comparison of plate vs C1 using K562 and E14 mESC and experiments testd on other tissues and cells.

2. Data analysis (customised analysis depending on project aims)

Follow the notebook for the data analysis:

1.mSp_exploratory_analysis_all.ipynb contains information about the quality control and other basic information about all experiments.

2.mSp_cell_type_identification.ipynb is the analysis for the identification of different cell types in the mouse spleen based on the scATAC-seq profiles.

3.mSp_motif_enrichment_analysis.ipynb is used to generate the heatmap representation of known motifs enrichment by HOMER.

This repository only contains necessary files used to reproduce the analyses and figures. Raw data and any intermediate file are not included and can be generated from the Data processing stage. The count matrix mSp_scATAC_count_matrix_over_all.mtx is not in this repository, because it is too large. For now, download the count file from here: ftp://ngs.sanger.ac.uk/production/teichmann/xi/plate_scATAC-seq, and put it under cmp_to_immgen/, and the later analysis should run without any problem.

Softwares/Packages

macs2 (v2.1.1.20160309) (this needs python2)
picard (v2.17.10)

# packages installed via conda:
#
# Name                    Version                   Build  Channel
bedtools                  2.27.1                        1    bioconda
cutadapt                  1.16                     py36_1    bioconda
hisat2                    2.1.0            py36pl5.22.0_0    bioconda
homer                     4.9.1                pl5.22.0_5    bioconda
matplotlib                3.0.0           py36h45c993b_1     conda-forge
numpy                     1.15.2          py36_blas_openblashd3ea46f_0  [blas_openblas]  conda-forge
pandas                    0.23.4           py36hf8a1672_0    conda-forge
salmon                    0.9.1                         1    bioconda
samtools                  1.7                           2    bioconda
scikit-learn              0.19.2          py36_blas_openblasha84fab4_201  [blas_openblas]  conda-forge
scipy                     1.1.0           py36_blas_openblash7943236_201  [blas_openblas]  conda-forge
seaborn                   0.9.0                      py_0    conda-forge
seqtk                     1.3                  ha92aebf_0    bioconda
snakemake                 5.3.0                    py36_1    bioconda

You also need calc, addCols, bedClip and bedGraphToBigWig from UCSC utilities.

Finally, you need bdg2bw to convert the macs2 generated begraph to bigwig for visualisation.

Experimental tips (not exactly science)

Some descriptive plots

Contact

Xi Chen
chenx9@sustech.edu.cn