deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicPlotMatrix run out of RAM #561

Closed r78v10a07 closed 4 years ago

r78v10a07 commented 4 years ago

Hi, I'm using hicPlotMatrix in a 236 GB RAM machine and it is being killed by the OS due to RAM usage.

My command is:

(hic) r78v10a07@instance-1:/data/hic$ hicPlotMatrix --matrix A_100bins.h5 --log1p --dpi 300 --clearMaskedBins --colorMap jet --title "Hi-C matrix for A" --outFileName A_1Mb_matrix.png --chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6
INFO:numexpr.utils:Note: NumExpr detected 64 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:hicmatrix.HiCMatrix:Number of poor regions to remove: 9919 {'chr1': 957, 'chr2': 105, 'chr3': 92, 'chr4': 84, 'chr5': 142, 'chr6': 73, 'chr7': 126, 'chr8': 90, 'chr9': 906, 'chr10': 97, 'chr11': 117, 'chr12': 61, 'chr13': 825, 'chr14': 922, 'chr15': 975, 'chr16': 490, 'chr17': 111, 'chr18': 190, 'chr19': 142, 'chr20': 86, 'chr21': 466, 'chr22': 675, 'chrX': 244, 'chrY': 1824, 'chr1_KI270713v1_random': 1, 'chr2_KI270715v1_random': 4, 'chr2_KI270716v1_random': 3, 'chr3_GL000221v1_random': 1, 'chr9_KI270717v1_random': 1, 'chr9_KI270718v1_random': 1, 'chr11_KI270721v1_random': 1, 'chr14_GL000009v2_random': 1, 'chr15_KI270727v1_random': 4, 'chr16_KI270728v1_random': 33, 'chr17_KI270730v1_random': 1, 'chr22_KI270737v1_random': 1, 'chr22_KI270739v1_random': 2, 'chrUn_KI270322v1': 1, 'chrUn_KI270316v1': 1, 'chrUn_KI270312v1': 1, 'chrUn_KI270317v1': 2, 'chrUn_KI270418v1': 1, 'chrUn_KI270422v1': 1, 'chrUn_KI270423v1': 1, 'chrUn_KI270425v1': 1, 'chrUn_KI270528v1': 1, 'chrUn_KI270530v1': 1, 'chrUn_KI270544v1': 1, 'chrUn_KI270548v1': 1, 'chrUn_KI270579v1': 1, 'chrUn_KI270329v1': 1, 'chrUn_KI270334v1': 1, 'chrUn_KI270335v1': 1, 'chrUn_KI270338v1': 1, 'chrUn_KI270340v1': 1, 'chrUn_KI270364v1': 1, 'chrUn_KI270366v1': 1, 'chrUn_KI270378v1': 1, 'chrUn_KI270379v1': 1, 'chrUn_KI270389v1': 1, 'chrUn_KI270390v1': 1, 'chrUn_KI270387v1': 1, 'chrUn_KI270395v1': 1, 'chrUn_KI270396v1': 1, 'chrUn_KI270388v1': 1, 'chrUn_KI270394v1': 1, 'chrUn_KI270386v1': 1, 'chrUn_KI270391v1': 1, 'chrUn_KI270384v1': 1, 'chrUn_KI270392v1': 1, 'chrUn_KI270381v1': 1, 'chrUn_KI270385v1': 1, 'chrUn_KI270382v1': 1, 'chrUn_KI270376v1': 1, 'chrUn_KI270372v1': 1, 'chrUn_KI270373v1': 1, 'chrUn_KI270375v1': 1, 'chrUn_KI270371v1': 1, 'chrUn_GL000195v1': 1, 'chrUn_GL000220v1': 1, 'chrUn_KI270741v1': 1, 'chrUn_GL000213v1': 1, 'chrUn_KI270745v1': 1, 'chrUn_KI270747v1': 4, 'chrUn_KI270751v1': 3, 'chrUn_GL000214v1': 2, 'chrEBV': 9}
INFO:hicmatrix.HiCMatrix:found existing 9919 nan bins that will be included for masking
INFO:hicexplorer.hicPlotMatrix:min: 1, max: 4631

Traceback (most recent call last):
  File "/data/conda/envs/hic/bin/hicPlotMatrix", line 7, in <module>
    main()
  File "/data/conda/envs/hic/lib/python3.6/site-packages/hicexplorer/hicPlotMatrix.py", line 614, in main
    start_pos=start_pos1, start_pos2=start_pos2, pNorm=norm, pAxis=ax1, pBigwig=bigwig_info)
  File "/data/conda/envs/hic/lib/python3.6/site-packages/hicexplorer/hicPlotMatrix.py", line 195, in plotHeatmap
    img3 = axHeat2.pcolormesh(xmesh.T, ymesh.T, ma, vmin=args.vMin, vmax=args.vMax, cmap=cmap, norm=pNorm)
  File "/data/conda/envs/hic/lib/python3.6/site-packages/matplotlib/__init__.py", line 1565, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/data/conda/envs/hic/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 6115, in pcolormesh
    coords = np.column_stack((X, Y)).astype(float, copy=False)
MemoryError: Unable to allocate 53.9 GiB for an array with shape (3617902201, 2) and data type float64
(hic) veraalva@instance-1:/data/hic$

Any advice to get this done? How much RAM would I need to run all chromosomes?

joachimwolff commented 4 years ago

Hi,

plotting the full matrix is always a trouble maker, simply because the the sparse representation needs to be transformed to a dense one which needs really large amounts of memory. Your output file name states it is a 1 MB resolution matrix, out of experience I am expecting this to work. And the step which requires the memory crashes for requesting just 53.9 GB. Two things to check here:

I hope some of these thoughts help.

Best,

Joachim

r78v10a07 commented 4 years ago

HI, Thanks for your comment. You are right, the resolution of the matrix was wrong. I corrected and this is working fine. Thanks