WeiqiangZhou / BIRD

Big data Regression for predicting DNase I hypersensitivity
30 stars 5 forks source link

Aggregating 200bp bin DHS level data into region data #7

Closed liangdp1984 closed 4 years ago

liangdp1984 commented 4 years ago

BIRD output a matrix contained all predited DH value in log scale (log2(x+1) transformed) in 200(bp) bins/windows. However, few people would interpret ATAC-seq/DNase-seq results in terms of windows. we can cluster adjacent windows into genomic regions and simplifies interpretation of the results. For differential DHS signal analysis, this merging step is necessary. we did differential binding analysis in chip-seq data using R csaw package. csaw is a window-based appoach and provides simple algorithms to cluster windows into regions. Do you have any suggession for this? Thanks!

WeiqiangZhou commented 4 years ago

I am not familiar with the csaw package. I usually perform downstream analysis using the 200bp window which represents an open or closed region, especially for TF binding regions. After obtaining differential regions, I would do a motif enrichment analysis to identify which TFs are enriched. Also, perform functional annotation analysis using either GREAT or GO based on the annotationed neighboring genes. For integrating windows, I would suggest integrating the promoter regions or enhancer regions (e.g., FANTOM enhancers).

If you want to integrate the adjacent windows as you suggested, I would say first set a threshold to obtain open regions and then merge the adjacent open windows. You can use the reduce function in the GenomicRanges package by setting the "min.gapwidth" parameter to merge the regions.

liangdp1984 commented 4 years ago

Thanks! I will try these methods you mentioned above and find out which method is more suitable!