Zhong-Lab-UCSD / Genomic-Interactive-Visualization-Engine

Genomic Interactive Visualization Engine
https://www.givengine.org/
Apache License 2.0
145 stars 31 forks source link

data conversion from different Hi-C contact matrices bin size #32

Open LucoLab opened 6 years ago

LucoLab commented 6 years ago

bash ./hic2give ./ test.hic giveInteraction.bed 40000

bin size that user wants to extract the data from (please make sure the bin size you entered is contained in the hic file). I'm trying to transform .Hic to interaction matrice. I don't understand the bin size and how to choose one.

The interaction I finaly generated with 40 000 seems almost emtpy and I have not so much interaction. Something interesting would be to browse from nearest to nearest interactions when your are blind and know what/where you want to look at.

THe project I try to visualise says on Geo :

content: Hi-C: tar ball archive of all normalized/corrected Hi-C data matrices binned at 40kb/250kb/1Mb, TAD boundaries at 40kb and genomic compartments at 250kb resolution

frankyan commented 6 years ago

The bin size is a required parameter for HiC data processing. It determines the resolution of Genomic Interaction from HiC data. When you use hic2give to convert certain HiC interaction table file format to give interaction format, you must correctly set the bin size to that used in HiC data processing.

You can read some papers about HiC bin_size, such as Hi-C: A comprehensive technique to capture the conformation of genomes. In section 3.3, it said

it is difficult to generate a Hi-C library with enough complexity or sequence depth to cover all possible restriction fragment interactions. In order to gain statistical power, it is useful to pool numbers of reads within larger genomic regions before further analyzing the data. Larger bins will contain more reads and thus have more discriminatory power, but at the cost of lowering the resolution of the data. The optimal bin size, and therefore the resolution at which the interaction data can be analyzed, depends on the sequencing depth and the linear separation of the genomic regions under consideration.