Open Kyung-TaeLee opened 4 years ago
Sorry about late response.
Q1. How did you run pipelines for controls? Unlike our ChIP-seq pipeline, ATAC-seq pipeline does not support controls.
Q2. So the pipeline calls peak (with MACS2) on each replicate and then IDR analysis is done on every pair of MACS2 peaks (e.g. rep1.narrowPeak.gz
vs rep2.narrowPeak.gz
). This is also done for pooled replicates. Among these IDR peaks, the best one is chosen based on different criteria (optimal/ conservative).
For unreplicated experiment, peaks are called on each pseudo-replicate (original reads are randomly shuffled and splitted into 2 pseudos) and then IDr analysis is done for two peaks (rep1-pr1.narrowPeak.gz
vs rep1-pr2.narrowPeak.gz
). For such case Nt
and Np
are always zero and N1
is the final IDR peak since there is only one IDR peak for unreplicated case.
Hi, first of all, thank you for providing a wonderful tool. I ran the ATAC-seq analysis using the pipeline on data as shown below
Analysis was finished successfully and have questions regarding the output files generated
Q1. What is the output file that can be used for analysis of differential usage of promoter between control and sample? Control was run without replicate and sample was run with 2 biological replicates
Q2. In the section "ATAC-seq Data Standards and Processing Pipeline" on the webpage of ENCODE, "The number of peaks within an IDR peak file should be >70,000, though values >50,000 may be acceptable" is specified in Current Standards section. Can you explain what is "IDR peak file"? Does this number related with the numbers specified for "N optimal" or "N conservative" in "Reproducibility QC and peak detection statistics" table? If not, can you please explain what do the numbers specified for "N optimal" or "N conservative" in "Reproducibility QC and peak detection statistics" table mean? (table below)
Thank you and looking forward to your reply