deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

About the reads used for hicBuildmatrix, the algorithm #386

Closed JiangXu123 closed 5 years ago

JiangXu123 commented 5 years ago

Hi, I have a question regarding how hicBuildMatrix work: I use this command to generate the contact matrix from several of my sequencing files. I want to know why some of my sample have very poor ratio of "pair used". So, how to interpret this analysis? image

JiangXu123 commented 5 years ago

Also, what do One mate low quality, One mate not unique, One mate unmapped mean?

joachimwolff commented 5 years ago

Hi,

A used ratio of 20 to 30 % is in Hi-C not that bad, actually quite normal. In your data so less data is used because many reads are either of low read quality (given by QC score of the read from the fastq file and the mapper) or that one of the two reads in one 2D location is not unique i.e. an identical read was recognized before. These reads are assumed to be PCR duplicates and therefore filtered out. Last, one mate unmapped means one read of the two could not be mapped by your used mapper.

Best,

Joachim

kalavattam commented 5 years ago

Following on what Joachim said, 20-30% post-processed information is not abnormal. If you haven't done so, I encourage you to take some time to review the following publications (below), which go over the assumptions and inefficiencies/efficiencies of Hi-C (dry lab work and, because the dry information is heavily influenced by what happens at the bench, wet lab work):

Also, even more details are available in the supplementary materials of Rao & Huntley et al., Cell 2014:

Suffice it to say, there's a lot to chromosome conformation capture experiments.