dekkerlab / crane-nature-2015

code associated with crane-nature-2015, 10.1038/nature14450
Apache License 2.0
34 stars 9 forks source link

matrix interactions too large - cannot handle in memory #2

Closed liz-is closed 9 years ago

liz-is commented 9 years ago

Hi - first of all, thanks for making your code available!

I'd like to use it on data from mouse, but am coming across issues with the larger chromosomes at high resolution, getting errors such as:

ERROR: matrix interactions too large - cannot handle in memory [19720 x 19720] (388,878,400 > 256,000,000 limit)

The actual memory usage of the code when running is low as far as I can tell, and I'm running it on a server with 512GB RAM, so I'm wondering if this matrix size limit can be adjusted? What would you recommend for using this code with mouse or human data?

blajoie commented 9 years ago

Hi - I will remove that hard limit in the code.

For this code (matrix2insulation) the memory usage will scale with your selected insulation square size * nrow/ncol.

So memory usage should always be low [MBs] given normal usage. We only load into memory N diagonals up to your specified insulation square size.

e.g. 1,000,000bp square size for a 5KB binned matrix would load into memory ~400 diagonals.

This script is more intended for matrices < 10,000 x 10,000. For the higher resolution data and larger matrices, a more efficient data format (sparse, hdf5, etc) would be more ideal, though this script may still work.

I tested the code on a ~50,000 x ~50,000 matrix, it takes quite a bit of time (15 minutes) to run. Mainly to simply read the gzipped txt file, but it does complete AOK.

Script will work for any species, as long as the matrix is binned into equal sized intervals, and the intervals are formatted in the my5C format. https://github.com/blajoie/crane-nature-2015/wiki e.g. bin1|hg19|chr1:40000-80000

The code is updated now, give it another shot!

blajoie commented 9 years ago

issue looks to be fixed now! closing this.