aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
62 stars 36 forks source link

Possible straw.py issue for older hic versions #21

Closed carlvitzthum closed 6 years ago

carlvitzthum commented 6 years ago

I'm working on developing a converter from .hic to .cool files (hic2cool) and came across a possible issue with a test file of hic version 6. The binX, binY, and counts values were not correct within the readBlock function.

The file is IMR90.hic and can be found here: https://bcm.app.box.com/v/aidenlab/folder/11235404320.

I was able to get the correct counts (i.e. ones that matched the output of juice_tools dump) by changing the lines 323 to 325 in python/straw.py. https://github.com/theaidenlab/straw/blob/65f0e94bf7cc11cfa6a9e7ddaf50205591bb6069/python/straw.py#L323-L325

Looking closely, the binX value == nRecords for i=0 in the loop over range(nRecords), which explains the problem. I changed the range of bytes read and it seems to work:

x = struct.unpack(b'<i', uncompressedBytes[(12i+4):(12i+8)])[0] y = struct.unpack(b'<i', uncompressedBytes[(12i+8):(12i+12)])[0] c = struct.unpack(b'<f', uncompressedBytes[(12i+12):(12i+16)])[0]

Just thought I would let you know in case this is a valid issue you want to fix.

Best, Carl

nchernia commented 6 years ago

Could you do a pull request?

On Wed, Oct 18, 2017 at 1:10 PM, carlvitzthum notifications@github.com wrote:

I'm working on developing a converter from .hic to .cool files (hic2cool) and came across a possible issue with a test file of hic version 6. The binX, binY, and counts values were not correct within the readBlock function.

The file is IMR90.hic and can be found here: https://bcm.app.box.com/v/ aidenlab/folder/11235404320.

I was able to get the correct counts (i.e. ones that matched the output of juice_tools dump) by changing the lines 323 to 325 in python/straw.py. https://github.com/theaidenlab/straw/blob/65f0e94bf7cc11cfa6a9e7ddaf5020 5591bb6069/python/straw.py#L323.

Looking closely, the binX value == nRecords for i=0 in the loop over range(nRecords), which explains the problem. I changed the range of bytes read and it seems to work:

x = struct.unpack(b'<i', uncompressedBytes[(12i+4):(12i+8)])[0] y = struct.unpack(b'<i', uncompressedBytes[(12i+8):(12i+12)])[0] c = struct.unpack(b'<f', uncompressedBytes[(12i+12):(12i+16)])[0]

Just thought I would let you know in case this is a valid issue you want to fix.

Best, Carl

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/straw/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW-qCppzdE8dwlpLTeUKKhxNoOXYRks5stjEVgaJpZM4P-DMd .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

carlvitzthum commented 6 years ago

Certainly. https://github.com/theaidenlab/straw/pull/22

Apologies for the number of whitespace changes, which were added automatically by my linter. Only lines 323-325 were meaningfully changed.

theaidenlab commented 6 years ago

Thanks!