deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
231 stars 70 forks source link

hicAdjust streaks and hicPCA always starting from 0 in version 3.4 #491

Closed kimj50 closed 4 years ago

kimj50 commented 4 years ago

Hi, Thank you again for the update : ). I've been playing with the new version (3.4) and I noticed a few things...

first file: hicadjustmatrix --region (center regions of all chromosome) --action keep it seems like 'keep' doesn't completely remove the beginning of the chromosomes after the first chromosome?... test

second file: hicadjustmatrix --chromosome > hicadjustmatrix --region (center region of the specific chromosome) --action keep If I do the same but instead only on single chromosome, it seems to work well. test_I

On second matrix (center of single chromosome) > hicPCA: bedgraph output I 0 4650000 0.030266865887 I 4650000 4660000 0.036209768885 I 4660000 4670000 0.028621320046

hicPCA seems to always start from 0. I can simply fix this by substituting 0 with 4640000. But is the pca computed properly? because the pearson matrix also looks funny, despite the original matrix looking normal: test_I_pm

Thank you! - Jun

gtrichard commented 4 years ago

Hello Jun,

Nice catch! What you reported is not the intended output, this will require some time to look into it.

For hicAdjustMatrix, I guess you can do it for each chromosomes one by one for the time being, for hicPCA, I think you just need to hicAdjustMatrix the pearson matrix. But this should be more streamlined in the next patch.

LeilyR commented 4 years ago

Hi Jun, Thanks for using out last version, May I ask you to plot the first file chromosome by chromosome and see if it still looks the same, I wonder if it has something to do with the plotting. If you use adjusted matrix to make pearson it should actually generates pearson matrix on the adjusted coordinates. and computes the pca also over the adjusted ones. IS it what you did? You could please try to output and plot the pearson matrix from hicPCA and see if it looks the same when you input an adjusted matrix.Thanks!

kimj50 commented 4 years ago

"May I ask you to plot the first file chromosome by chromosome and see if it still looks the same, I wonder if it has something to do with the plotting." - I think it definitely has to do with the plotting, because if i plot each chromosome, the plot looks fine. whole_center center_I center_II center_X

"If you use adjusted matrix to make pearson it should actually generates pearson matrix on the adjusted coordinates. and computes the pca also over the adjusted ones. IS it what you did?" yes, in my first post, the pm is the -pm output from hicPCA.

"You could please try to output and plot the pearson matrix from hicPCA and see if it looks the same when you input an adjusted matrix" it seems like -pm output of hicPCA isn't just plotting problem. center_pm_whole center_pm_I center_pm_II center_pm_X

On the side note (not surprisingly), the pca values of the two cases are the same:

  1. adjustmatrix (chrI) -> adjustmatrix (center) -> hicPCA I 0 4650000 0.030266865887 I 4650000 4660000 0.036209768885 I 4660000 4670000 0.028621320046 I 4670000 4680000 0.024591755204 I 4680000 4690000 0.019458173186 I 4690000 4700000 0.029596640952 I 4700000 4710000 0.027017999888 I 4710000 4720000 -0.007823169806 I 4720000 4730000 -0.002924065376 I 4730000 4740000 -0.003656820642

  2. adjustmatrix (center of every chromosome) -> hicPCA I 0 4650000 0.030266865887 I 4650000 4660000 0.036209768885 I 4660000 4670000 0.028621320046 I 4670000 4680000 0.024591755204 I 4680000 4690000 0.019458173186 I 4690000 4700000 0.029596640952 I 4700000 4710000 0.027017999888 I 4710000 4720000 -0.007823169806 I 4720000 4730000 -0.002924065376 I 4730000 4740000 -0.003656820642

Thanks! - Jun

LeilyR commented 4 years ago

so the above matrices are Pearson correlation you got as output of hicPCA on a adjusted matrix? I have just used version 3.4.3 and could not reproduce this issue . Could you please send me the command you use to generate your adjusted matrix and then the pea values?

kimj50 commented 4 years ago

Hi, I'm still using 3.4.2. I forgot to mention that the matrix comes from .hic file, converted from hicConvertMatrix, but normalized using ICE method using hicexplorer.

hicAdjustMatrix -m matrix.h5 \ -r regions.txt \ --action keep \ -o matrix_regions.h5

hicPCA -m matrix_regions.h5 \ -noe 1 \ -f bedgraph \ --method dist_norm \ -pm .matrix_regions_norm_pm.h5 \ --ignoreMaskedBins \ -o matrix_regions_pca1.bedgraph

hicPlotMatrix -m matrix_regions_norm_pm.h5 -o regions_pm.png hicPlotMatrix -m matrix_regions.h5 -o regions.png --log1p

regions_pm regions

first 3 lines of matrix_regions_pca1.bedgraph I 0 4650000 -0.022717729954 I 4650000 4700000 -0.041453508754 I 4700000 4750000 0.001597827105

LeilyR commented 4 years ago

So two things that I can think of: 1) could you please try to plot one chromosome at a time? 2) Am I right that your point is that you did not keep beginning of the chromosome I but you can see the pc values were assigned to that coordinate? I have tried the following and I cannot generate your issue : hicexplorer --version hicexplorer 3.4.3 hicAdjustMatrix -m matrix.h5 -r regions2keep.bed --action keep -o matrix_regions.h5 hicPCA -m matrix_regions.h5 -o pc1.bw pc2.bw --method dist_norm --chromosomes 2L 2R 3L 3R X --pearsonMatrix pearson_matrix.h5 --extraTrack h3k27ac.bw --histonMarkType active hicPlotMatrix -m pearson_matrix.h5 -o pearson_2R.png --region 2R --vMin -1 --vMax 1 --colorMap RdBu_r hicPlotMatrix -m matrix_region.h5 -o matrix_2R.png --region 2R --log1p pearson_2R matrix_2R

If I save pc as bedgraph I see: 2R 5790000 5820000 -0.055295331700 2R 5820000 5850000 -0.050785457847 2R 5850000 5880000 -0.053331785979 2R 5880000 5910000 -0.049209458601 2R 5910000 5940000 -0.057686252053 2R 5940000 5970000 -0.056164342927

so all the coordinates are fine and I cannot reproduce your problem.

joachimwolff commented 4 years ago

Hi,

I'm still using 3.4.2. I forgot to mention that the matrix comes from .hic file, converted from hicConvertMatrix, but normalized using ICE method using hicexplorer.

This is actually a very important information. I recently fixed a bug in the load and store function for cool files converted from hic. Please make sure you install HiCMatrix in version 13, and do the conversion from the hic file again.

Best,

Joachim

kimj50 commented 4 years ago

Hi, I updated to 3.4.3. And I started from .hic file. hicexplorer --version hicexplorer 3.4.3 -rw-r----- 1 kimj50 users 10836 Jan 23 11:45 conda-meta/hicmatrix-11-py_0.json.c~ -rw-r----- 1 kimj50 users 10794 May 22 20:22 conda-meta/hicmatrix-13-py_0.json my conda-meta directory seems to have both 11 and 13...could the hicexplorer be using the older version?

  1. Plotting per chromosome fixes the hicplotmatrix output for the original adjusted matrix. But it doesn't fix -pm matrix from hicpca.
  2. it does not fix the -pca output

DJ52_2_N2genome_10kb_s_ICE_center_X DJ52_2_N2genome_10kb_s_ICE_center_norm_pm_X

I 0 10130000 -0.013724358038 I 10130000 10140000 -0.052584451939 I 10140000 10150000 -0.028132542570 I 10150000 10160000 -0.019077970874

joachimwolff commented 4 years ago

Concerning the different versions, please use a new conda environment: conda create --name hic3.4.3 hicexplorer=3.4.3 hicmatrix=13 and activate it via conda activate hic3.4.3

joachimwolff commented 4 years ago

We have published version 3.5 with many bug fixes. Please reopen if this bug is still existing in this version.

yangfangyuan0102 commented 1 year ago

Hi, I am using the latest version 3.7.2. In my case, this problem still exists. I used hicAdjustMatrix to mask the first half of one chromosome.

The bed file (addedauto.bed) used for masking: OW028702.1 0 18450000

However, in hicPCA result: OW028702.1 0 18550000 0.045427123859 OW028702.1 18550000 18600000 0.048286874344 OW028702.1 18600000 18650000 0.062323953915 ....

my commands: hicAdjustMatrix -m corrected.h5 --regions addedauto.bed -a mask -o addedsex.h5 hicPCA -m addedsex.h5 -o addedsex.h5.pc1.bed -we 1 -f bedgraph --ignoreMaskedBins