- CEMBA_wmb_snATAC
This repository is used for the whole mouse brain (wmb) snATAC-seq data analysis
of Center for Epigenomics of the Mouse Brain Atlas (CEMBA), which is now accepted by Nature 2023.
[[./repo_figures/GraphAbstract.jpg]]
** Data
- Our dissection 4D annotated in the meta data, should be 4E; and 4E should be 4D.
- Demultiplexed data can be accessed via the NEMO archive (NEMO,
RRID:SCR_016152) at https://assets.nemoarchive.org/dat-bej4ymm (the
raw directory under Source Data URL in this archive).
- We also uploaded our demultiplexed fastq files and processed files
under the GEO accession number GSE246791:
- Processed data is also available on our web portal and can be explored here: http://www.catlas.org.
- All the cellmeta information:
** Pipeline
- We now have 234 samples and 2.3 million cells in total. So most
of the analysis are depend on Snakefile to organize the pipeline
and submit them to high-performance cluster (HPC) in order to
use hundreds of CPUs at the same time.
- R, Shell and Python (>= 3.10) are mainly used, especially R (>= 4.2).
- Under the directory [[./package][package]], we put lots of common functions there.
- We mainly use [[https://github.com/kaizhang/SnapATAC2][SnapATAC2]] to analyze the single-nucleus ATAC-seq data
- Comparation between Scrublet and AMULET: https://github.com/yuelaiwang/CEMBA_AMULET_Scrublet
- The deep learning related codes now in the repo: https://github.com/yal054/mba_dl_model
- sa2 is short for SnapATAC2 in this repo.
[[./repo_figures/snATAC-seq_analysis_pipeline.jpg]]
Codes
** Clustering
In total, we have implemented four-round iterative clustering.
See details in [[file:01.clustering][01.clustering]]
Integration and annotation
We use Allen's scRNAseq data and their annotations for our data annotation.
See details in [[file:02.integration][02.integration]]
Peak calling
We use macs2 with multiple stage filtering, especially use SPM >= 5
for filtering peaks.
See details in [[file:03.peakcalling][03.peakcalling]]
Comments on some scripts:
- [[file:package/R/cembav2env.R][cembav2env.R]]: R env to store the metadata during analysis.
|----------------+-------------------------------------------------------|
| Enviorment | Description |
|----------------+-------------------------------------------------------|
| cembav2env | meta data of SnapATAC and SnapATAC2 |
|----------------+-------------------------------------------------------|
| cluSumBySa2 | clustering meta data, such as resolution, |
| | barcode to L4 Ids, L4 major regions and so on |
|----------------+-------------------------------------------------------|
| Sa2Integration | Integration meta data, like Allen's data descriptions |
|----------------+-------------------------------------------------------|
| Sa2PeakCalling | Peak calling meta data |
|----------------+-------------------------------------------------------|