Closed hyl317 closed 5 years ago
I don't think that the coordinate lists returned by straw.straw are meant to be in a specific order since it's not mentioned anywhere in the docs.
If you want to create an array from those lists, you could take advantage of the scipy.sparse.coo_matrix class and its .toarray( ) method (and then make sure your matrix is triangular/symmetric).
import numpy as np
from scipy.sparse import coo_matrix
import straw
values, index_dim1, index_dim2 = result = straw.straw(
"NONE",
"https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic",
str(args.chrom),
str(args.chrom),
"BP",
args.bin
)
as_coo = coo_matrix(
(
values,
(
[ position//resolution for position in index_dim1 ],
[ position//resolution for position in index_dim2 ]
)
)
)
as_array = as_coo.toarray()
# then handle the matrix format
if np.allclose(np.tril(as_array), as_array): print( "is lower triangular" )
elif np.allclose(np.triu(as_array), as_array): print( "is upper triangular" )
else: as_array = np.tril(as_array + as_array.T)
My bad, those aren't in the right order, it should be index_dim1, index_dim2, values, see line 591:
return [xActual, yActual, counts]
My bad, those aren't in the right order, it should be index_dim1, index_dim2, values, see line 591: return [xActual, yActual, counts]
Hi, thank you! This seems a clever way to get it around. Just one question. In case the as_array is neither upper nor lower triangular, in doing np.tril(as_array + as_array.T), I think we should subtract diagonal elements from one of the as_array so that counts on the diagonal will not be doubled?
Yep!
I think I have another question about straw, though. When extracting info at higher resolution, for example, at 500kb, I noticed that, although the chromosome 1 has 499 bins, the length of "values" as in the following code is 104646, smaller than what I would expect from 1+2+...+498=124251. Does straw automatically ignore zeros?
index_dim1, index_dim2, values, = result = straw.straw( "NONE", "https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic", str(args.chrom), str(args.chrom), "BP", args.bin )
yes, straw ignores zeros and only outputs the non-sparse bins.
On Mon, Sep 2, 2019 at 10:59 AM Yilei Huang notifications@github.com wrote:
I think I have another question about straw, though. When extracting info at higher resolution, for example, at 500kb, I noticed that, although the chromosome 1 has 499 bins, the length of "values" as in the following code is 104646, smaller than what I would expect from 1+2+...+498=124251. Does straw automatically ignore zeros? index_dim1, index_dim2, values, = result = straw.straw( "NONE", " https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic", str(args.chrom), str(args.chrom), "BP", args.bin )
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/36?email_source=notifications&email_token=AAK2EW7JSN5PYYGBJ42AONLQHUS5ZA5CNFSM4IOJ3MA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WBJLQ#issuecomment-527176878, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK2EW44SBL5WZ2NQVCXYTTQHUS5ZANCNFSM4IOJ3MAQ .
-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org
My intention is to use straw to extract a contact matrix from .hic file so that $$i^{th}$$ row in the matrix stores the counts of the $$i^{th}$$ bin with all other bins. Trying to understand the format of the output of straw, I used the following command:
When args.bin = 2.5Mb, I printed out the first few entries of result[0], result[1], and result[2], by inspecting the result and reading the source code of straw.py, I knew that result[0],result[2] are x,y-axis, respectively, and result[2] is the counts.
By tracing the x, and y axis, I guessed that result gives the counts data by iterating the upper triangular matrix column by column (as shown in the first half of the image); however, when I binned at 25kb, the results seem confusing. If results give the full upper triangular matrix, then result[2] should have #ofbins(1+#ofbins)/2 entries; this is the case when binSize=2.5Mb yet not true when binSize=25kb. I tried to trace the x,y labels and get a confusing line as shown in the second half of that image. So I wanna ask how is the upper triangular matrix coded in the straw output? Please correct me if I misunderstand how STRAW works since I haven't read the code very carefully.