deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

what confused me about hicFindTADs ? #414

Closed hzaumsq closed 5 years ago

hzaumsq commented 5 years ago

when I used hicFindTADs to detecte TADs,The following parameters that I couldn't understand . minDepth and maxDepth, The software's suggestion just a range.so when I try different parameters, the result will also different, like this ..... so what's the best parameters for me to choose or it has some standards for this ? someone can help me ? Best wishes

hzaumsq commented 5 years ago

O5Q`R~02L`7)O5I$}HJWD@W 1S)`Z 9PYPOLY4}71V4M A

wzhjlau2009 commented 5 years ago

@joachimwolff,@hzaumsq , Hi,i have the same problem that i don't know how to fix the parameters for this, also the --minBoundaryDistance , can you explain how to set these parameters ? Thank you very much !

LeilyR commented 5 years ago

Unfortunately there is no standard for tad calling and all depends on your data and its depth. In the images you have sent I cannot really see any TADs, it might be because of the scale though if it is a bit too zoomed out. Also, your TAD separation score don't seem to be matched to your boundaries. A good practice could be calling TADs once with all the default parameters and then try to modify them and each time check how they look by eyes until getting the best result. Also bear in mind, if your data is shallow and the resolution is too high you might want merge some of your bins prior to TAD calling. You can find more information about the parameters here: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html#hicfindtads

gtrichard commented 5 years ago

A good way to check if your TAD boundaries are correctly called in regard to the TAD insulation score (or separation score) is to plot the average insulation score signal at TAD boundaries using deepTools computeMatrix and plotProfile (you need insulation scores as a bigwigs).

By using different TAD calling parameters, you should keep the TAD set with the lowest average... up to a certain number of TADs (you don't want to call only 700 strong TADs for instance).

Another way, is to check the enrichment of TAD boundaries predictive features, i.e. CTCF binding motifs or ChIP-seq enrichment for mammals, or Beaf-32, CP190, etc motifs for Drosophila for instance.

But in the end, there is no standard for TAD calling, nor a defined number of "TADs" in a genome. It all depends on what you call a "TAD"... which can sometimes be defined more as "Triangles At the Diagonal" rather than "Topologically Associating Domains". So if you want to verify an hypothesis at "TADs", use different sets of TADs (i.e. different TAD calling parameters to get a different number of TADs, for instance 750, 1500 and 2500 TADs) and verify that your hypothesis stands for the different set of TADs you called using different parameters. In the end it all makes sens since TADs are nested structures, different set of TADs should give you different "levels" of this nested structure : the lower the number of TADs, the bigger hey are, and the higher the order of chromatin organisation they belong to.

hzaumsq commented 5 years ago

@gtrichard there is another question,I had read your team published paper. you identified a new insulator protein of Drosophila melanogaster, for my project ,It lost CTCF and other insulator protein ,I want to use your method to find the new elements in boundary , the first step ,I must to know where is boundary and then to do next step, so what result I had analyzed is not dependable,because of the boundary is varied.

gtrichard commented 5 years ago

Hum so I guess it is in mammals? Well you can try different parameters until you call the "correct" number of TADs (i.e., a number of TADs equivalent to recent publications).

If you fear that the TAD structure might be overall affected, you can inspect the log2ratio matrix to see if there's a collapse of TADs in the genome. In that case you can expect less TADs to be called overall.

But again, there's no standard and little expectations you can have on what to call as TADs in a KO sample... Call TADs, check the average insulation score at the boundaries, and scan the genome to see if the Hi-C matrix is matching the TADs called (be sure to use the same matrix resolution for both plotting and TAD calling).

Also, if you compare Control (/WT) vs KO, be sure to call TADs in both of them, overall the number and the structure of the called TADs is usually matching different samples if there's no difference in the TAD structure. If you consistently call less TADs in the KO, perhaps the TAD structure or the overall insulation score are affected...

For your question, you can call different set of TADs, i.e. different number of TADs, and check the motifs enriched at the boundaries.

hzaumsq commented 5 years ago

That's very kind of you ! @gtrichard @LeilyR