deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
227 stars 70 forks source link

3.7.2 hicPCA produces extreme eigenvalues #829

Open xscapintime opened 1 year ago

xscapintime commented 1 year ago

Hi,

I'm using the latest version of HiCExplorer, and I have had used version 3.6.* last year. The eigenvalues produced are completly different. I have read the code and commit info saying 3.7.2 is more Lieberman-Aiden way, PCA on an obs/exp matrix, and in 3.6 it's PCA on an pearson's matrix.

Python version: 3.8.13

I used the same command, as below, one for pearson matrix and eigenvalue bedgraph, and one for eigenvalue bw,

hicPCA -m ${inp} --outputFileName ${out}.pca1.bedgraph -we 1 --format bedgraph \
--pearsonMatrix ${out}_pearson_all.h5 \
--extraTrack ../histonemark/ENCODE_H1_H3K27ac.bigwig

hicPCA -m ${inp} --outputFileName ${out}.pca1.bw -we 1 --format bigwig \
--extraTrack ../histonemark/ENCODE_H1_H3K27ac.bigwig

here is the result produced by 3.6, I use np.histogram to have a quick glance

np.histogram(bed[3])
(array([  43,  486, 3351, 3882, 4228, 5106, 2034,  413,  109,   31]), array([-0.10961274, -0.08537601, -0.06113927, -0.03690253, -0.0126658 , 0.01157094,  0.03580768,  0.06004441,  0.08428115,  0.10851789, 0.13275462]))

So the range of PC1 is about -0.1 to 0.13. pearson 3.6

and this is by 3.7.2

np.histogram(bed2[3])
(array([    1,     2,     5,   882, 20226,   117,    21,     7,     2, 431]), array([-0.73764337, -0.56387903, -0.3901147 , -0.21635036, -0.04258602, 0.13117831,  0.30494265,  0.47870699,  0.65247133,  0.82623566, 1.        ]))

And now the range of PC1 is about -0.7 to 1, and most of the values are very close to 0.

pearson 3.7.2

Personally I don't think the results from 3.7.2 looks right.

In this paper they said PCA was done on contact matrix. And the distribution of PC1 is similar to the results from hicPCA 3.6.

image

Thank you.

ralfgilsbach commented 1 year ago

Hi, same issue. The pearson correlation matrix looks nice and the bigwig/bedgraph values are extreme an don`t match. I checked -we 1 and 2. Thanks

zhongzheng1999 commented 5 months ago

I also found the same issue, I wonder if anyone has a better explanation. I guess that with the shortening of bin length, it may be more likely to have some abnormally high observations, and therefore extreme eigenvalues.

@xscapintime @ralfgilsbach I wonder how you finally dealt with this problem?

ralfgilsbach commented 4 months ago

We moved back to homertools for eigenvector calculations. It should be fixed in hicexplorer to work in a comparable manner.

xscapintime commented 4 months ago

@zhongzheng1999 Hi, I changed to cooltoos for all the analysis.

zhongzheng1999 commented 4 months ago

@xscapintime @ralfgilsbach Thank you for your reply! I think it's more reliable to use some good old tools to do the work.