aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
61 stars 36 forks source link

Python: when c2 <= c1 output is not flipped back #119

Open lldelisle opened 1 year ago

lldelisle commented 1 year ago

Describe the bug When fetching the contacts between 'chr1' 'chrM', it gives the same result as 'chrM' 'chr1' without any notice to the user.

To Reproduce

import numpy as np
import hicstraw
hic_file =  'ENCFF080DPJ.hic'
chrom1 = 'chr1'
chrom2 = 'chr2'
result = hicstraw.straw('observed', 'NONE', hic_file, chrom1, chrom2, 'BP', 1000000)
for i in range(10):
     print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
result = hicstraw.straw('observed', 'NONE', hic_file, chrom2, chrom1, 'BP', 1000000)
for i in range(10):
     print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

Expected behavior I would expect to have the same result but with column 1 and 2 shifted. I got exactly the same result.

The problem is that the chromosomes are flipped here: https://github.com/aidenlab/straw/blob/2525edc29bbb48463799cad94cbd6e5e810210a0/pybind11_python/src/straw.cpp#L1222-L1227 but this is information is not stored. Therefore, the results are not flipped back in: https://github.com/aidenlab/straw/blob/2525edc29bbb48463799cad94cbd6e5e810210a0/pybind11_python/src/straw.cpp#L1368-L1374 and the matrix is not transposed in: https://github.com/aidenlab/straw/blob/2525edc29bbb48463799cad94cbd6e5e810210a0/pybind11_python/src/straw.cpp#L1430