Questions about output files from ATAC-seq pipeline

ENCODE-DCC / atac-seq-pipeline

ENCODE ATAC-seq pipeline

MIT License

391 stars 174 forks source link

Hi, first of all, thank you for providing a wonderful tool. I ran the ATAC-seq analysis using the pipeline on data as shown below

control -> no replicate
sample -> 2 biological replicate.

Analysis was finished successfully and have questions regarding the output files generated

Q1. What is the output file that can be used for analysis of differential usage of promoter between control and sample? Control was run without replicate and sample was run with 2 biological replicates

Q2. In the section "ATAC-seq Data Standards and Processing Pipeline" on the webpage of ENCODE, "The number of peaks within an IDR peak file should be >70,000, though values >50,000 may be acceptable" is specified in Current Standards section. Can you explain what is "IDR peak file"? Does this number related with the numbers specified for "N optimal" or "N conservative" in "Reproducibility QC and peak detection statistics" table? If not, can you please explain what do the numbers specified for "N optimal" or "N conservative" in "Reproducibility QC and peak detection statistics" table mean? (table below)

Thank you and looking forward to your reply

Sorry about late response.

Q1. How did you run pipelines for controls? Unlike our ChIP-seq pipeline, ATAC-seq pipeline does not support controls.

Q2. So the pipeline calls peak (with MACS2) on each replicate and then IDR analysis is done on every pair of MACS2 peaks (e.g. rep1.narrowPeak.gz vs rep2.narrowPeak.gz). This is also done for pooled replicates. Among these IDR peaks, the best one is chosen based on different criteria (optimal/ conservative).

For unreplicated experiment, peaks are called on each pseudo-replicate (original reads are randomly shuffled and splitted into 2 pseudos) and then IDr analysis is done for two peaks (rep1-pr1.narrowPeak.gz vs rep1-pr2.narrowPeak.gz). For such case Nt and Np are always zero and N1 is the final IDR peak since there is only one IDR peak for unreplicated case.

ENCODE-DCC / atac-seq-pipeline

Questions about output files from ATAC-seq pipeline #299