Open ChenfuShi opened 4 years ago
This is a very good idea. It should be possible to extract as a numpy array or scipy sparse. We probably won't be able to get to this for a few weeks and would welcome any contributions from the community.
On Thu, Jun 4, 2020 at 5:27 AM chenfu shi notifications@github.com wrote:
Hello, Sorry if there was an easier way to extract data that I haven't seen but:
Is your feature request related to a problem? Please describe. The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily. While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw.
%%timeit data = strawC.strawC('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000) extract = lambda x: (x.binX, x.binY, x.counts) converted_data = np.array(list(map(extract, data)), dtype = np.int64) matrix = scipy.sparse.coo_matrix((converted_data[:,2],(converted_data[:,0]//10000,converted_data[:,1]//10000)))
707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit data = straw.straw('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000) matrix = scipy.sparse.coo_matrix((data[2],(np.array(data[0])//10000,np.array(data[1])//10000)))
673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Describe the solution you'd like Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix? If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that?
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/50, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EWYNMHZUMWE7GMLX6VLRU5SHTANCNFSM4NSOE6QA .
-- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org
Hello, Sorry if there was an easier way to extract data that I haven't seen but:
Is your feature request related to a problem? Please describe. The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily. While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw.
707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Describe the solution you'd like Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix? If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that?
Thanks!