deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
229 stars 70 forks source link

hicMergeMatrixBins changes chromosome sizes #906

Open qolba opened 1 month ago

qolba commented 1 month ago

Hi, I have a question about hicMergeMatrixBins behavior and I would like to ask you for some clarification.

I use hicexplorer 3.7.5 working with h5 files.

I've noticed, that hicMergeMatrixBins command change the chromosome sizes. I mean:

(hicexplorer) user@naboo:/home/dir$ hicInfo -m mymatrix.h5 # Matrix information file. Created with HiCExplorer's hicInfo version 3.7.5 File: mymatrix.h5 Size: 623,472 Bin_length: 5000 Sum of matrix: 290013357.0 Chromosomes:length: chr1: 248387328 bp; chr2: 242696752 bp; chr3: 201105948 bp; chr4: 193574945 bp; chr5: 182045439 bp; chr6: 172126628 bp; chr7: 160567428 bp; chr8: 146259331 bp; chr9: 150617247 bp; chr10: 134758134 bp; chr11: 135127769 bp; chr12: 133324548 bp; chr13: 113566686 bp; chr14: 101161492 bp; chr15: 99753195 bp; chr16: 96330374 bp; chr17: 84276897 bp; chr18: 80542538 bp; chr19: 61707364 bp; chr20: 66210255 bp; chr21: 45090682 bp; chr22: 51324926 bp; chrX: 154259566 bp; chrY: 62460029 bp; chrM: 16569 bp; Non-zero elements: 399,928,510 Minimum (non zero): 1.0 Maximum: 69427.0 NaN bins: 0

(hicexplorer) user@naboo:/home/dir$ hicMergeMatrixBins -m mymatrix.h5 -o mymatrix_nb10.h5 -nb 10

(hicexplorer) user@naboo:/home/dir$ hicInfo -m mymatrix_nb10.h5 # Matrix information file. Created with HiCExplorer's hicInfo version 3.7.5 File: mymatrix_nb10.h5 Size: 62,348 Bin_length: 50000 Sum of matrix: 290006984.0 Chromosomes:length: chr1: 248387328 bp; chr2: 242696752 bp; chr3: 201100000 bp; chr4: 193574945 bp; chr5: 182045439 bp; chr6: 172126628 bp; chr7: 160550000 bp; chr8: 146250000 bp; chr9: 150600000 bp; chr10: 134750000 bp; chr11: 135127769 bp; chr12: 133324548 bp; chr13: 113550000 bp; chr14: 101150000 bp; chr15: 99750000 bp; chr16: 96330374 bp; chr17: 84276897 bp; chr18: 80542538 bp; chr19: 61700000 bp; chr20: 66200000 bp; chr21: 45090682 bp; chr22: 51324926 bp; chrX: 154250000 bp; chrY: 62450000 bp; chrM: 16569 bp; Non-zero elements: 210,406,959 Minimum (non zero): 1.0 Maximum: 322638.0 NaN bins: 920

You can see, that some chromosomes (for instance chr3, chr7, chr8 etc) became shorter but not all of them.

Could you kindly explain the reasoning behind this behavior? I rely on HicExplorer output for downstream analysis, and this issue adds some complexity. I would greatly appreciate knowing in which cases I should expect this behavior to occur.

joachimwolff commented 1 month ago

Hi,

That should not happen. Did you use a chromosome size file for creating the matrices?

qolba commented 1 month ago

Those matrices were created with HiC-Pro, it uses chrom.size file at some point of matrix creation. So the naive file format was hicpro (matrix + .bed), than i converted it with:

(hicexplorer) user@naboo:/home/dir$ hicConvertFormat -m mymatrix.matrix --bedFileHicpro mymatrix_abs.bed --inputFormat hicpro --outputFormat h5 -o mymatrix.h5