deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicFindTADs inconsistency on TAD boundaries width #577

Open u-n-i-v-e-r-z opened 4 years ago

u-n-i-v-e-r-z commented 4 years ago

Hi everyone,

I have two questions about the behaviour of hicFindTADs.

1. Should all the boundaries of TADs have the same width ?

According to the documentation, I understand that TAD boundaries ranges should have the same width/length than the given resolution, say 10kb :

"The genomic coordinates in this file correspond to the resolution used. Thus, for Hi-C bins of 10.000bp the boundary position is 10.000bp long"

However, after digging into those files it seems that not all the TAD boundaries follow this rule (at least on my data). Here is an example: This is the coordinates of a TAD I detect using a 10kb normalized Hi-C matrix (10kb_domains.bed)

V 9435001-10080000      * |  ID_0.01_58 -0.794052501266     #33A02C 9435001-10080000

And here is the associated 5' boundary (10kb_boundaries.bed)

 V 9427501-9442500      * |      B06944 -0.794052501266

You can notice that the width of this boundary is of 15kbp instead of the 10kb it should have (and like most of the other boundaries). It is not at the beginning/end of the given chromsome neither (which could explain shorter boundaries). Did I miss something in the doc or is this intended ?

Moreover, this example highlight another question that I have.

2. Do we expect the TAD width to be a multiple of Hi-C resolution ?

We can see here that this TAD starts at position 9435001 which fall in the middle of a 10kb bin. Yet, it has been computed on the same 10kb resolution Hi-C matrix. Which is weirded is that when using hicMergeTADbins to get the TAD binned matrix and when extracting the position of the bin (that should correspond to the position of the TADs), it appears that the function has "corrected" the off-positionned TAD borders :

TAD coordinates from hicFindTADs (10kb_domains.bed)

V 9435001-10080000      * |  ID_0.01_58 -0.794052501266     #33A02C 9435001-10080000

TAD coordinates from hicMergeTADbins contact matrix (extracted from 10kb_TADbin.tsv)

V 9430001-10080000      * | ID_0.01_58 -0.794052501266     #33A02C 9435001-10080000

As stated above, I might have missed something in the doc but should anyone expect such behaviour ?

Thanks is advance !

Best,

Alex

joachimwolff commented 4 years ago

Hi Alex,

  1. Is this for only one domain and boundary or for all? Can you please post all the used commands you applied so far (starting from hicBuildMatrix, all intermediate steps and the hicFindTADs call? What version of HiCExplorer did you use?
  2. Maybe at @GinaRe or @fidelram can answer this.

Best,

Joachim

u-n-i-v-e-r-z commented 4 years ago

Hi Joachim,

I hope you're doing well. Thanks for your reply. You will find attached a directory in which I put the data and an R script (with associated .RProj so if you have Rstudio you can launch this and it will set the environment and path to the data accordingly). This script contains the Matrix preprocessing I used and some code for testing the relationship between TAD width / TAD boundaries according to the given resolution (10kb). Something that I didn't mention in the post is that I masked the data during the process on some ranges (given by mappability.10kb bedfile) so I also compare with TAD called without masking. There is indeed a difference and TAD seems to fit to the resolution if we skip masking (not TAD boundaries though). Hope this helps ! Thanks again for your time.

Best,

Alex

joachimwolff commented 4 years ago

Hi Alex,

there is no attachment, not sure if a github issue supports this.

Best,

Joachim

u-n-i-v-e-r-z commented 4 years ago

Oh I see, yet I attached the files so it should be github filtering it. Is there another way I could send those files to you ?

Alex

Edit : Here we go

Issue_hicexplorer.tar.gz

joachimwolff commented 4 years ago

Upload it to Google Drive and post the link?

u-n-i-v-e-r-z commented 4 years ago

I don't know if you saw my edit on the previous comment but here is the file :

Issue_hicexplorer.tar.gz

GinaRe commented 4 years ago

Hi Alex,

1. Is this for only one domain and boundary or for all? Can you please post all the used commands you applied so far (starting from hicBuildMatrix, all intermediate steps and the hicFindTADs call? What version of HiCExplorer did you use?

2. Maybe at @GinaRe or @fidelram can answer this.

Best,

Joachim

Hi,

I haven't encountered the problems that you are describing but I have mostly worked with restriction fragment resolution matrices. Indeed I agree that masking could contribute to what you are observing.

Best, Gina