deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicPlotMatrix numpy MemoryError #416

Closed millerh1 closed 5 years ago

millerh1 commented 5 years ago

Hello!

I am receiving a strange error when attempting to follow the tutorial using GSE101921. When I go to run hicPlotMatrix, I recieve this error: MemoryError. I am using 64-bit python 3.6.7 and installed hicExplorer using conda. I also have plently of RAM and my TMPDIR is empty.

Here is the command/traceback:

# $line is SRR7061198
hicPlotMatrix \
    --matrix Data/$line/hic_corrected.h5 \
    --log1p \
    --dpi 300 \
    --clearMaskedBins \
    --chromosomeOrder chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY \
    --colorMap jet \
    --title "Hi-C matrix for "$line \
    --outFileName Data/$line/plots/plot_1Mb_matrix.png

INFO:hicmatrix.HiCMatrix:Number of poor regions to remove: 52831 {'chr1': 4318, 'chr10': 3664, 'chr11': 3610, 'chr12': 908, 'chr13': 2378, 'chr14': 2392, 'chr15': 2831, 'chr16': 1980, 'chr17': 931, 'chr18': 680, 'chr19': 676, 'chr2': 2631, 'chr20': 434, 'chr21': 1640, 'chr22': 1968, 'chr3': 1108, 'chr4': 1513, 'chr5': 2355, 'chr6': 1134, 'chr7': 1542, 'chr8': 1803, 'chr9': 3695, 'chrM': 1, 'chrX': 2147, 'chrY': 5932, 'GL000192.1': 55, 'GL000225.1': 11, 'GL000194.1': 20, 'GL000193.1': 15, 'GL000200.1': 19, 'GL000222.1': 19, 'GL000212.1': 18, 'GL000195.1': 14, 'GL000223.1': 19, 'GL000224.1': 9, 'GL000219.1': 7, 'GL000205.1': 10, 'GL000215.1': 18, 'GL000216.1': 18, 'GL000217.1': 18, 'GL000199.1': 17, 'GL000211.1': 17, 'GL000213.1': 17, 'GL000220.1': 16, 'GL000218.1': 17, 'GL000209.1': 16, 'GL000221.1': 16, 'GL000214.1': 12, 'GL000228.1': 13, 'GL000227.1': 13, 'GL000191.1': 11, 'GL000208.1': 10, 'GL000198.1': 6, 'GL000204.1': 4, 'GL000233.1': 5, 'GL000237.1': 5, 'GL000230.1': 5, 'GL000242.1': 5, 'GL000243.1': 5, 'GL000241.1': 2, 'GL000236.1': 5, 'GL000240.1': 5, 'GL000206.1': 5, 'GL000232.1': 1, 'GL000234.1': 1, 'GL000202.1': 3, 'GL000238.1': 4, 'GL000244.1': 4, 'GL000248.1': 4, 'GL000196.1': 4, 'GL000249.1': 4, 'GL000246.1': 4, 'GL000203.1': 3, 'GL000197.1': 4, 'GL000245.1': 4, 'GL000247.1': 4, 'GL000201.1': 4, 'GL000235.1': 4, 'GL000239.1': 4, 'GL000210.1': 3, 'GL000231.1': 1, 'GL000226.1': 2, 'GL000207.1': 1}
INFO:hicmatrix.HiCMatrix:found existing 52831 nan bins that will be included for masking
INFO:hicexplorer.hicPlotMatrix:min: 0.23987850337011332, max: 275.82766173012305

Traceback (most recent call last):
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/bin/hicPlotMatrix", line 7, in <module>
    main()
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/lib/python3.6/site-packages/hicexplorer/hicPlotMatrix.py", line 537, in main
    matrix = np.asarray(ma.getMatrix().astype(float))
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/lib/python3.6/site-packages/hicmatrix/HiCMatrix.py", line 197, in getMatrix
    matrix = self.matrix.todense()
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/lib/python3.6/site-packages/scipy/sparse/base.py", line 848, in todense
    return asmatrix(self.toarray(order=order, out=out))
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1024, in toarray
    out = self._process_toarray_args(order, out)
  File "/home/UTHSCSA/millerh1/miniconda3/envs/hiCExplorerEnv/lib/python3.6/site-packages/scipy/sparse/base.py", line 1186, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
LeilyR commented 5 years ago

you are simply running out of memory

dpryan79 commented 5 years ago

How much memory is "plenty of memory"?

millerh1 commented 5 years ago

I do not see how that would be possible given that I am monitoring my memory usage while running this command and it never exceeds 30%. The command alone only uses 0.3% of the system's memory.

(hiCExplorerEnv) millerh1@cbbi16:~$ free -mth
              total        used        free      shared  buff/cache   available
Mem:           503G        169G         58G        139M        275G        333G
Swap:          7.4G        7.4G          4K
Total:         511G        176G         58G
millerh1 commented 5 years ago

After testing different settings for hicPlotMatrix I think found that you're right and this is simply an issue of not enough RAM.

If I run the command with only chr1 it successfully produces a plot but eats up 5.7% of the system memory (~30G). If the command is attempting to calculate the memory cost of plotting every chromosome (as in the tutorial's command), then I can see how that would return a MemoryError.

While running the command, I didn't see any memory usage beyond .3% as I would have expected in a memory error -- but it could be that numpy calculates available memory and returns the error before attempting to actually utilize those resources.

Also I'm pretty sure I misread the tutorial and tried to run these commands with a matrix that had too high of a resolution -- my apologies!

For reference -- this is the command which worked but ate 5.7% of memory:

millerh1@cbbi16:~$ hicPlotMatrix --matrix "/home/UTHSCSA/millerh1/Bishop.lab/Preprocessing/HiC_Seq/GSE101921_SA1_SA2_HiCExplorer/Data/SRR7061198/hic_corrected.h5" --outFileName "/home/UTHSCSA/millerh1/Bishop.lab/Preprocessing/HiC_Seq/GSE101921_SA1_SA2_HiCExplorer/Data/SRR7061198/plots/fileName" --dpi 300 --region chr1:500000-211800000 --log1p
INFO:hicexplorer.hicPlotMatrix:min: 0.23987850337011332, max: 275.82766173012305
LeilyR commented 5 years ago

Glad that you could get it work.