deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
226 stars 70 forks source link

plotmatrix vs plotTAD Min/max values #411

Closed kimj50 closed 4 years ago

kimj50 commented 5 years ago

Hi, Thank you for the amazing tool. I am a graduate student learning to use the tool. I am plotting log transformed normalized .h5 matrix using v3.0. I was wondering how the values of two plotting methods are related: min_value/max_value of plotTAD VS vMin/vMax of plotMatrix plotTAD: INFO:pygenometracks.tracksClass:plotting 1. [x-axis] INFO:pygenometracks.tracksClass:plotting 2. [hic matrix] INFO:pygenometracks.tracks.GenomeTrack:setting min, max values for track 2. [hic matrix] to: -5.136343927519093, -1.361211237458261 image

plotMatrix: INFO:hicexplorer.hicPlotMatrix:min: 1.1085717943615236e-07, max: 1.0 image

In both cases, I used the same matrix, log transformation, and the same region. The two plots look very similar visually, but I would like to be able to set min/max values of equal range in both plots. Also, plotMatrix shows color axis values, but plotTAD doesn't show. Thank you! - Jun

joachimwolff commented 5 years ago

Hi,

My apologies for the delayed answer.

Yes you are right, the value ranges should be the same in both. I need to have a closer look to answer this question. Concerning the values for the axis for pyGenomeTracks / hicPlotTads to my knowledge they should be there, and they should be on the right and not left side. Can you show me your tracks.ini file?

Independent of this, the value range of you data is way too low. Please consider that we expect in HiCExplorer count data and not probabilities like balancing with e.g. cooler balance creates. Most / all of our analysis algorithms will fail with value ranges < 1.

Best,

Joachim

kimj50 commented 5 years ago

Hi, Thank you for helping me troubleshoot. I recently updated to hicexplorer3.1, here are my commandline, tracks, and new output. I also used non-hicNormalized norm_range matrix. plotTAD: track: [x-axis] where = top

[hic matrix] file = /scratch/kimj50/elegans/matrix_corrected/N2_2017_emb_sum_12kb_correct.h5 title = test_12kb depth = 3000000 transform = log

commandline: hicPlotTADs --tracks test.ini --region chrX:1-3,000,000 -o test_X1-3,000,000.png

INFO:pygenometracks.tracksClass:plotting 1. [x-axis] INFO:pygenometracks.tracksClass:plotting 2. [hic matrix] INFO:pygenometracks.tracks.GenomeTrack:setting min, max values for track 2. [hic matrix] to: 3.398763589567347, 7.963610500655614 test_X1-3,000,000

plotMatrix: commandline: hicPlotMatrix -m /scratch/kimj50/elegans/matrix_corrected/N2_2017_emb_sum_12kb_correct.h5 --region chrX:1-3,000,000 --log -o test_matrix.png

INFO:hicexplorer.hicPlotMatrix:Cooler or no cooler: False INFO:hicexplorer.hicPlotMatrix:min: 0.4062786846827751, max: 18097.20104230793 test_matrix

Not surprisingly, if i force min/max values of plotMatrix to be the same as ones in plotTAD, it gives me this: hicPlotMatrix -m /scratch/kimj50/elegans/matrix_corrected/N2_2017_emb_sum_12kb_correct.h5 --region chrX:1-3,000,000 --log -o test_matrix_equalminmax.png --vMin 3.398763589567347 --vMax 7.963610500655614 INFO:hicexplorer.hicPlotMatrix:Cooler or no cooler: False INFO:hicexplorer.hicPlotMatrix:min: 0.4062786846827751, max: 18097.20104230793 test_matrix_equalminmax

Lastly, I have a question about hicNormalize function. In the past, I have been doing the following to look at matrices: buildmatrix -> summatrices -> mergebins -> diagnostic -> correction by ICE -> hicNormalize norm_range -> further analysis. But your second paragraph implies that I shouldn't be using hicNormalize function when using hicexplorer, which is what I followed above. Then, for what purpose should I be using hicNormalize norm_range function? Thanks! - Jun

joachimwolff commented 5 years ago

Hi,

the value range is the same, in hicPlotMatrix and hicPlotTads/pyGenomeTracks we handle the plotting a bit differently. In the first one the value range stays the same but the plotting library is getting the command to use a log. See: https://github.com/deeptools/HiCExplorer/blob/master/hicexplorer/hicPlotMatrix.py#L430 In hicPlotTads/pyGenomeTracks we transform the values: https://github.com/deeptools/pyGenomeTracks/blob/master/pygenometracks/tracks/HiCMatrixTrack.py#L215 And this is the cause why you get different outputs. In the first case it is the real value range, in the second one the value range with applied np.log which is the natural logarithm: https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html

By clipping the values as you did in the last plot you do not scale the values, you set the max and min fixed if the original value is smaller / greater. Not surprisingly the most values are large as 7.96 and therefore set to this value and your plot is just red.

Last question: hicNormalize is there to normalize values between different samples. If you need this functionality we recommend to normalize to the lowest read coverage of all samples you have, this is the mode smallest. The mentioned norm_range is not having a real use case for HiCExplorer itself but we offer it in case you want to transform your matrix to another Hi-C interaction format and / or you need the 0 - 1 value range in another software.

Your workflow should be: build matrix in the desired resolution, (merge replicates), (merge bins), normalize all samples to smallest read coverage, diagnostic, correction by ICE / KR.

Best,

Joachim

gtrichard commented 5 years ago

Your workflow should be: build matrix in the desired resolution, (merge replicates), (merge bins), normalize all samples to smallest read coverage, diagnostic, correction by ICE / KR.

About this, I think we should update the documentation and use case at some point.

kimj50 commented 5 years ago

Is there a similar 'clipping option' for hicPlotTAD? min/max_value options do not seem to work the same way. My TADplot is missing the color axis label, which seems like a problem on my end. I be happy to try any recommendations. Thanks! - Jun

LeilyR commented 4 years ago

could you please out latest version, missing label could be due to the version of matplotlib and should be fixed in the correct version.