deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

`hicSumMatrices` complains about different chromosome order while they should be the same #715

Closed cgirardot closed 3 years ago

cgirardot commented 3 years ago

Hi,

I am having a weird error when trying to hicSumMatrices 6 matrices, it fails quickly telling me:

ERROR:hicexplorer.hicSumMatrices:The two matrices have different chromosome order. Use the tool `hicConvertFormat` to change the order.
matrice1.h5: ['chr2L', 'chr2R', 'chr3L', 'chr3R', 'chrX', 'chr4']
matrice2.h5: ['chr2L', 'chr2R', 'chr3L', 'chr3R', 'chrX', 'chr4']

the matrices have been produced with the same code so the chr order should be the same. Please note that the 6 matrices have been previously manipulated as follows (after the hicBuildMatrix) :

  1. remove all inter-chromosomal bins. This uses a custom piece of code where I :
    • hicConvert from h5 to cool
    • cooler dump --join file.cool | awk '$1==$4 {print $0}' | gzip > file.bedpe
    • cooler load --assembly dm6 -f bg2 chro.txt:5000 file.bedpe
    • hicConvert from cool to h5
  2. normalize by smallest counts
  3. hicSumMatrices ...

Note that the chro.txt contains : chr2L 23513712 chr2R 25286936 chr3L 28110227 chr3R 32079331 chrX 23542271 chr4 1348131

I am on hicExplorer 3.6 (conda)

Any help would be appreciated. Thank you!

cgirardot commented 3 years ago

@joachimwolff I am sorry to ping you directly but I am stuck here and would really appreciate your input.

I tried to use hicConvertFormat as suggested by the error message and converted the h5 to cool hoping for some magic to happen but got the same error when trying to hicSumMatrices the cooler matrices

cgirardot commented 3 years ago

I forgot to paste the command line (v 3.6)

hicSumMatrices-3.6 -m HiC_A_R1_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool HiC_A_R2_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool HiC_B_R1_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool HiC_B_R2_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool HiC_C_R1_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool HiC_C_R2_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool -o HiC_all_R12_all-rep-merged.5Kb-bin-matrix.noInterChr.countNorm.cool
joachimwolff commented 3 years ago

Hi,

Sorry for my delayed answers, so much other stuff to do at the moment (getting a paper done, finishing my PhD). Probably already noticed by my less activity on github and the not released features of other branches in the last half year.

Anyhow, I do not know what cooler dump and load do in detail. My suggestion would be to skip this step and to add it to your pipeline after hicSumMatrices. The result is logically the same.

Best,

Joachim

cgirardot commented 3 years ago

thank you for taking the time to look into this.

the cooler dump and load are here to eliminate all inter-chromosomal interactions (#704) before the count normalization. I need to do this before the summing.

The error message is weird ie the chromosome order looks the same to me

joachimwolff commented 3 years ago

I only have a workaround in mind, but not really the time to fix the source code (or to check what is going on).

I propose the following workaround:

  1. Sum the six non-normalized and full matrices together to one.
  2. Remove the inter-chromosomal regions with cooler
  3. Use hicInfo with the --no_metadata option on all six matrices which got the inter-chromosomal regions already removed. You receive per matrix the sum of contacts. Add these numbers together.
  4. Use hicInfo with the --no_metadata option to retrieve the sum of the matrix from step 2.
  5. Compute the multiplicative factor with the sum from 3rd and 4th step: multiplicative factor = 3rd / 4th
  6. Use hicNormalize with the multiplicative mode: --normalize multiplicative --multiplicativeValue ValueFrom5
  7. Check with hicInfo --no_metadata if the sum of the created matrix is (more or less) equal the sum of the six matrices of step 3.

I hope this helps, if there are more questions or issues please contact me again.

Best,

Joachim

cgirardot commented 3 years ago

Hi @joachimwolff sorry for not commenting on this earlier. I did follow a very close approach. One aspect I'd like to share is that I had to revert to keeping all the interactions in the matrices (but still normalize using only the intra chromosomal interaction counts) as many of my following steps started failing when the inter-chr interactions were missing.

I think it would make sense to enable an --intra-interaction-only on the hicNormalize.

Thanks again for your help