LiuLabUB / HMMRATAC

HMMRATAC peak caller for ATAC-seq data
GNU General Public License v3.0
98 stars 23 forks source link

Q: Which output file is suggested to use for downstream Differential Accessibility of nucleosome free regions? #40

Closed LewisKristy closed 4 years ago

LewisKristy commented 4 years ago

Should the gappedpeak be converted to a narrow peak to read in to Diffbind or use the summits.bed file instead. I noticed the ranges are slightly different between the two (nucleosome free in gappedpeak and range in summits). Does the summits file only contain nucleosome free regions?

EvanTarbell commented 4 years ago

So this is a great question that we have thought long and hard on. I want to say up front that I am not convinced that any differential peak caller is ideal for ATAC-seq because they don't account for the structure, which we showed with HMMRATAC is critical. I am also developing an add on for HMMRATAC to do differential peak calling, while incorporating the structure, but it won't be done and released for a while. I am hoping early next year. Generally, i like to use the summit file with DiffBind to call differentially accessible peaks. I will typically take the summit file and extend each 1bp summit by +- 50bp, creating 100bp regions. These are what i then feed into DiffBind. Because DiffBind was developed for TF differential binding, it seems to do better when given more narrow regions. And because the ATAC-seq summits are the likely TFBS, the changes in accessibility there seem more meaningful.
And to your last question, the summit file contains the summit, or position of maximum read coverage (smoothed by a Gaussian kernel) of the nucleosome free region only.

liangcmu commented 4 years ago

I am confused by "I will typically take the summit file and extend each 1bp summit by +- 50bp, creating 100bp regions. These are what i then feed into DiffBind". What do you mean exactly?

We have summit_start and summit_end. How should we do in terms of your suggestion? Something like summit_start-50 (new start) and summit_end+50 (new end) or what?

EvanTarbell commented 4 years ago

Exactly. The summit is 1bp, so subtract 50 from the start and add 50 to the stop for newStart and newStop and use that resulting bed file as the input for DiffBind

igordot commented 4 years ago

extend each 1bp summit by +- 50bp, creating 100bp regions

I understand the reasoning behind those thresholds, but is there a good definition of those regions?