deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Acquiring interactions from the Hi-C Matrix #379

Closed abhisheksinghnl closed 5 years ago

abhisheksinghnl commented 5 years ago

Hi,

I have build my interaction matrix and now I want to get the interactions out of this matrix.

How can I do this?

I have performed LoopDetection calling and I got some loops. Is this all I can do or something more I can do to get the interaction of the regions that might have gone undetected during Loop detection step.

Thank you

gtrichard commented 5 years ago

You can use hicConvertFormat to export a h5 or cool file to GInteractions format. This will give you ALL interactions as a bed-like file.

Loop detection gives you the meaningful interactions in a validated manner. If you want to see long range contacts between specific regions, it is better to use hicAggregateContacts than playing around a GInteractions file.

abhisheksinghnl commented 5 years ago

Hi,

Thank you for your reply. I got a file in this format:

chr1B   487650000       487675000       chr1B   502625000       502650000       2
chr1B   487650000       487675000       chr1B   502775000       502800000       1
chr1B   487650000       487675000       chr1B   502875000       502900000       1

Just to confirm, the first three columns are the starting co-ordinate of the interaction site and the column no. 4-6 are the interaction site, what is the last column representing?

I am guessing column 7 is representing interaction counts and if that is the case normally what is a good number in that column?

gtrichard commented 5 years ago

The first three column = bin1 The second three column = bin2 Last column = number of interactions, i.e. the number of read pairs corresponding to bin1 and bin2.

A "good number" is very hard to define... It will highly depend on the distance between bin1 and bin2 of course. This is not a trivial question.

If you want to see if some interactions are enriched against background, you need a proper way to do it. HiCExplorer is beginning to support such approaches and it boils down to loop-calling, as you said in your first message. The problem is that is depends on your genome mostly... For mammalian genomes I think it is not an issue, for anything else it can be problematic. If you want to see if some loops were missed somehow, I think a good way would be to make a obs/exp matrix with hicPCA, or just a normal matrix (corrected) and plot it alongside the loops with hicPlotMatrix that now supports loop plotting. If you see "loop-like" regions on the matrix that are not called, then maybe you need to tweak your parameters.

Alternatively you can subset your GInteraction file for the loop position you identified to give you an idea about "what is a good number". You can do it this way in R:

library(GenomicRanges)
library(InteractionSet)

convertToGI <- function(df){
  row.regions <- GRanges(df$V1, IRanges(df$V2,df$V3))# interaction start
  col.regions <- GRanges(df$V4, IRanges(df$V5,df$V6))# interaction end
  gi <- GInteractions(row.regions, col.regions)
  gi$norm.freq <- df$V7 # Interaction frequencies
  return(gi)
}
df<-read.table("file.GInteractions",sep="\t")
loops<-read.table("loops.txt",sep="\t")

df.gi <- convertToGI(df)
loops.gi <- convertToGI(loops)

df.loops.gi<-subsetByOverlaps(df.gi ,loops.gi)

But I guess that your loop file is already stating these numbers anyways.