Closed apaytuvi closed 7 years ago
either the .npz or .h5 format store the matrix plus the bins (chrom, start, end). Each index in the chrNameList
, startList
, endList
, and extraList
correspond to the bin index in the matrix.
In extraList I usually put the bin read coverage. extraList
is only used under certain circumstances by hicCorrect to discard bins containing repetitive regions. It is safe to replace it by a vector of 0s or any other numeric value.
Also in the .npz or .h5 format the correction factors used for the hic iterative correction are saved and a vector containing the indices of all bins in the matrix that were filtered during the correction.
@fidelram Thank you. I store my matrix this way:
np.savez("HepG2_150000.npz",matrix=matrix, chrNameList=chromosomes, startList=starts, endList=ends, extraList=extra)
being matrix:
array([[ 34, 55, 0, ..., 0, 0, 0],
[ 55, 282, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 2960]], dtype=int16)
being chromosomes:
array(['chr1', 'chr1', 'chr1', ..., 'chrY', 'chrY', 'chrM'],
dtype='|S5')
being starts:
array([ 0, 150000, 300000, ..., 59100000, 59250000, 0])
being ends:
array([ 150000, 300000, 450000, ..., 59250000, 59373566, 16571])
being extra:
array([ 1, 2, 3, ..., 20650, 20651, 20652])
And when I try to use this npz
file with HiCExplorer
, it always fails. I've reported a problem with hicPlotMatrix
, but also with hicCorrectMatrix
:
hicCorrectMatrix diagnostic_plot -m HepG2_150000.npz -o diagnostic150000.png
Traceback (most recent call last):
File "/usr/bin/hicCorrectMatrix", line 5, in <module>
pkg_resources.run_script('HiCExplorer==1.3', 'hicCorrectMatrix')
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1455, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/EGG-INFO/scripts/hicCorrectMatrix", line 7, in <module>
main()
File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicCorrectMatrix.py", line 598, in main
plot_total_contact_dist(ma.matrix, args)
File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicCorrectMatrix.py", line 476, in plot_total_contact_dist
hic_ma.data[np.isnan(hic_ma.data)] = 0
TypeError: bad argument type for built-in operation
I think the problem is caused because you are not using a sparse matrix. Simple add:
from scipy.sparse import csr_matrix matrix = csr_matrix(matrix)
Then you can save it. This is how I save the matrix:
np.savez( filename, matrix=matrix, chrNameList=chromosomes, startList=starts, endList=ends, extraList=extra, nan_bins=save(np.array([]), correction_factors=None)
@fidelram Thank you, now it works!
I have a matrix I generated in
npz
. I've realized I need to have besides ofmatrix
,chrNameList
,startList
,endList
, andextraList
. What's inside them?Thank you.