databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 14 forks source link

quantitative ATACseq #201

Closed wangjiawen2013 closed 2 years ago

wangjiawen2013 commented 3 years ago

Hi, Traditional ChIP-seq methodologies are not inherently quantitative and therefore do not allow direct comparisons between samples derived from different cell types or between cells that have experienced a perturbation, such as a genomic alteration or chemical treatment. For example, if we employ the traditional reads per million (RPM) ChIP-seq normalization method, a cell population containing chromatin state ‘‘A’’ (a high level of histone posttranslational modification) will appear similar to a cell population containing chromatin state ‘‘B,’’ where 50% of the signal has been removed (Figure 1A), because the signal is quantified as a simple percentage of all mapped reads. Moreover, additional variables, such as variations in genome fragmentation, immunoprecipitation efficiency, or other experimental steps, frequently confound analysis. So quantitatve ChIP-seq using spike-in was recommended to quantify the peak signal and comparsion among samples: image Now, when performing ATACseq, we're meeting the same question with ChIP-seq. when the chromatin is less accessible, less PCR products will be obtained, while the concentration will be adjusted and the same amount of DNA will be loaded into the sequencer, and the data will be normalized with traditonal methods (such as the sequencing depth). How to quantify and normalize the ATACseq data if using spike-in with PEPATAC ?

jpsmith5 commented 3 years ago

Hey @wangjiawen2013,

I'm not immediately aware off the top of my head of a field-standard ATAC-seq spike-in normalization method (if you have one in mind that you are aware of please pass it along). So, at the moment, there is no pre-built method in the pipeline to do so, but I think you could adapt standard outputs from the pipeline to achieve such a thing manually.

If you have a spike-in of an alternative genome (e.g. Drosophila), you could include the Drosophila genome as a prealignment step. This would in turn allow you to know the sum of reads that aligned to your spike-in AND the sum of reads that have aligned to your primary genome. Therefore you could do some version of spike-in normalization of the ratio of reads per million relative to your spike-in aligning reads. I'll have to keep thinking on this too. I'll see if I can't set up a synthetic experiment to test some approaches.