4dn-dcic / pairix

1D/2D indexing and querying on bgzipped text file with a pair of genomic coordinates
MIT License
86 stars 14 forks source link

Integer overflow for -n option: reporting negative numbers! #58

Closed mimakaev closed 5 years ago

mimakaev commented 5 years ago

At least it is observed on the largest Bonev dataset.

magus@wiz:~/work/distillate/rao2014Test$ pairix -n HiC_NPC_4.nodups.pairs.gz -2003369355

mimakaev commented 5 years ago

@SooLee The issue is still present as of pairix 0.3.6.

sergpolly commented 5 years ago

little update on this one: I was running the same test - Bonev 2017 and got an error while dealing with the same library HiC_NPC_4.nodups.pairs.gz (the largest one)

In my case however, pairix -n is doing fine: screenshot from 2018-12-26 15-29-40

But! pypairix is not so great: screenshot from 2018-12-26 15-37-29

The value itself 2.3 billion is indeed > int32 upper limit.

@nvictus suggests that the int32 overflow happens here: https://github.com/4dn-dcic/pairix/blob/master/src/pairixmodule.c#L594

pairix version screenshot from 2018-12-26 15-32-20

nvictus commented 5 years ago

I suspect a downcast to int32 is happening upon converting to a Python int due to the i argument. See https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue

SooLee commented 5 years ago

@mimakaev My apology - I saw this issue just now. Haven't had a chance to work on pairix for some time. I should check more often! If you do pairix index with 0.3.6 before pairix -n, do you still have the problem? The linecount is saved during the indexing. 0.3.6 Doesn't fix the overflow from a previously indexed pairs file. It works with previous indexes but it does not correct for the overflow. I could add that feature to 0.3.7.

SooLee commented 5 years ago

@sergpolly @nvictus Thanks for reporting the problem with pypairix! Apologies for not noticing the issue earlier. We're looking into it right now.

SooLee commented 5 years ago

The pypairix get_linecount issue is now fixed in 0.3.7.