dozmorovlab / HiCcompare

Joint normalization of two Hi-C matrices, visualization and detection of differential chromatin interactions. See multiHiCcompare for the analysis of multiple Hi-C matrices
https://dozmorovlab.github.io/HiCcompare/
Other
18 stars 3 forks source link

No MCC values in the filter_param plot #32

Open ashishjain1988 opened 7 months ago

ashishjain1988 commented 7 months ago

Hi,

I am trying using the filter_params function to select the optimum A.min values for filtering. We are interested in contacts on chromomse 4. When I check the plot, it seems to not have the MCC values for A values (approx from 2 to 8). Is there an reason for the package to not able to calculate the MCC values? Here is the plot that I got for chromosome 4.

Screenshot 2024-01-23 at 11 30 52 AM
mdozmorov commented 7 months ago

Hi @ashishjain1988 , it is hard to tell why some MCC values are missing. I won't be concerned about it. More important is to find acceptable True and Falce positive rate cutoffs. I'd be conservative and pick 10 but 7-8 is also OK. We already discussed that HiCcompare is robust to the choice of A https://github.com/dozmorovlab/HiCcompare/issues/29#issuecomment-1535572871 because small differences are unlikely to be detected as statistically significant. I'll keep an eye on missing MCC values and debug when have an example.

ashishjain1988 commented 7 months ago

Hi @mdozmorov , thank you for your response. This data is more deeply sequenced than the previois one. One thing I want to ask is the TPR and FPR. Based on this plot it seems like the False Positive rate is way higher than the true positive rate at A.min=10. Is still that a good threshold? Also, the default threshold of 2 is not giving us any significant contacts.

mdozmorov commented 7 months ago

I overlooked the curves are inverted, this is indeed confusing. Here's the explanation from my student, @hamy12398:

Their plot can happen since it can depend on number of changed they set. (ex above, I set numberChanges to 30). Since MCC is based from products of different sum pairs of TP, TN, FP, FN in their denominator in their fraction function, so by some chance if this denominator = 0, it can cause MCC to be undefined. image

What are the parameters you used for filter_params()? Can you try with numChanges = 30?

ashishjain1988 commented 7 months ago

I was actually carrying out the analysis using 25kbp resolution and as mentioned in the manual i proportionally increased the numChanges to 2500 (filter_params(hic.list[[i]],numChanges = 2500)). Is that too much for 25kbp resolution? I will try out the numChanges = 30 too. Thanks!

ashishjain1988 commented 7 months ago

Below is the plot I got using the filter_params function for chromosome 4. The resolution I used is 25kbp and numChanges = 30. It seems like the all the results are FPR image

mdozmorov commented 7 months ago

It is hard to tell without seeing the data. Have you tried to visualize single matrices? It may be the data is very sparse at 25k resolution.

ashishjain1988 commented 7 months ago

This is how the contact data looks like for individual samples. The scale is log2. image image

mdozmorov commented 7 months ago

The data looks good. I still cannot say why your A plot looks strange. Try debugging of the actual function. Again, A threshold is not that critical, I would explore the MD plot, call differential interactions and visualize them.

ashishjain1988 commented 7 months ago

Thanks! I will look into that.