deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
231 stars 70 forks source link

Interpretation of Insulation Score #766

Open StephenRicher opened 2 years ago

StephenRicher commented 2 years ago

Hi,

Thanks very much for developing this fantastic resource. I am currently studying comparing the HiCexplore insulation scores between TAD types and just wanted to clarify some interpretation. In our results we observed approximately equal insulation score at the boundaries themselves between samples but a globally lower insulation score in one group type ('ASTAD').

Given that the 'TAD' sample has much large delta insulation score at the boundary compared to the 'ASTAD' sample I would guess this would indicate the 'TAD' sample has a stronger boundary? Or does the absolute boundary indicate the boundary strength is equal, in which case I am not sure what would drive the globally lower insulation score in this sample.

Best wishes, Stephen

image

joachimwolff commented 2 years ago

Dear Stephen,

That's not an easy question to answer. I will try my best, but maybe we need to discuss this a bit to conclude.

I understand that you extracted the insulation score at boundaries. What we see is the averaged left boundaries as the left minimum and the right boundaries as the right minimum. The area in between is the averaged insulation scores of the scaled intra-TAD regions (I interpret your annotation this way). What I do not understand is how the, I assume, global insulation score to the left of the left boundary and right of the right boundary is computed. Also, what is the range -4 to 4?

Assuming I interpreted your issue so far correctly, the boundaries' strength is more or less equal, as you wrote. The peak of 'TAD' in the middle indicates a stronger intra-TAD interaction scheme than 'ASTAD'. Also, the indication is that the strength of interactions of intra-TAD compared to non-TADs is more distinct. But this is better checked with a chi-square test (average peak intra-TAD, average global score of the two samples). However, we need to clarify if you normalized the two samples to a similar read coverage? The difference could be explained if you have not applied a normalization.

Would you please clarify if my interpretations are correct and answer my open questions?

Best,

Joachim

StephenRicher commented 2 years ago

Hi Joachim,

Thanks very much for your detailed response. So yes the vertical dashes represent the left and right boundaries of the domain and the region in-between represents the scaled intra-TAD region. Outside of these boundaries is the surrounding region +/- 4 'domain sizes'. For example, given a TAD domain of 100kb, i also obtained the insulation score +/- 400kb surrounding the domain. This allowed me to capture a scaled view of the intra-TAD and surrounding insulation score.

For some context, the 'ASTADs' are domains that appear to show some allele-specific chromatin interactions and I am comparing the mean insulation score (from a normal un-phased HiC matrix) between ASTADs and non-ASTADs. So I thought it might reasonable that these could have 'weaker' TAD boundaries if the interactions different between alleles. There are many more non-ASTADs which is why the mean non-ASTAD ('TAD') signal is much smoother.

The data is normalised and in fact the two TAD types are from the same HiC dataset. One thing I did notice was that ASTADs appear to be enriched within B compartments relative to non-ASTADs. I had a brief look at insulation scores between A and B compartments and the distributions do appear to differ. Insulation scores in B compartments seem to have a lower mean but are less spread out.

InsulationByCompartment

Thanks again for your help, Stephen

Drosophilid commented 2 years ago

Dear Stephen,

That's not an easy question to answer. I will try my best, but maybe we need to discuss this a bit to conclude.

I understand that you extracted the insulation score at boundaries. What we see is the averaged left boundaries as the left minimum and the right boundaries as the right minimum. The area in between is the averaged insulation scores of the scaled intra-TAD regions (I interpret your annotation this way). What I do not understand is how the, I assume, global insulation score to the left of the left boundary and right of the right boundary is computed. Also, what is the range -4 to 4?

Assuming I interpreted your issue so far correctly, the boundaries' strength is more or less equal, as you wrote. The peak of 'TAD' in the middle indicates a stronger intra-TAD interaction scheme than 'ASTAD'. Also, the indication is that the strength of interactions of intra-TAD compared to non-TADs is more distinct. But this is better checked with a chi-square test (average peak intra-TAD, average global score of the two samples).

Hi @joachimwolff, my question is somewhat related to the second part of your interpretation.

  1. I'm also comparing my TAD results between different samples that all have different read coverage. Where, some of my samples have very high read depth compared to the others. So, I was trying to downsample them to the lowest read level that can allow me to compare the samples in downstream analysis (e.g. TAD no really depends on the read depth) . I tried to use hicNormalize function with default parameters, but I finds the normalized matrices still have different read depth.

  2. Secondly I was also wondering if this normalization method could also help to bypass the copy no variation between sex chromosomes and autosomes?