aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
61 stars 36 forks source link

Q: Fastest way to determine if mzd has no records #116

Closed reg-el closed 1 year ago

reg-el commented 1 year ago

As per title.

Given a MatrixZoomData file:

import hicstraw
hic = hicstraw.HiCFile(path)
resolution = min(hic.getResolutions())
chromosomes = {c.name : c for c in hic.getChromosomes()}
mzd = hic.getMatrixZoomData("1", "2", "observed", "KR", "BP", resolution)

what is the fastest way to determine if there will be no records? Using getRecords seems to be pretty slow for large chromosomes...

records = mzd.getRecords(0, chromosomes["1"].length, 0, chromosomes["2"].length)
len(records) == 0 

Thank you in advance!

reg-el commented 1 year ago

Never mind multiprocessing to the rescue...