aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
62 stars 36 forks source link

EOF met with straw.cpp + aidenlab/Juicebox/blob/master/data/inter.hic #32

Closed lindenb closed 5 years ago

lindenb commented 5 years ago

Hi, I'm playing with straw.cpp and the file https://github.com/aidenlab/Juicebox/blob/master/data/inter.hic that I saved on my side as 'jeter.hic'

$ wget -q -O - "https://github.com/aidenlab/Juicebox/blob/master/data/inter.hic?raw=true" | sha1sum 
1f7fc1149306dc17b1e51b09053d152ebcef1cb0  -
$ ls -lah ~/jeter.hic
-rw-rw-r-- 1 lindenb lindenb 41M juin   4 11:32 /home/lindenb/jeter.hic
$ sha1sum ~/jeter.hic
1f7fc1149306dc17b1e51b09053d152ebcef1cb0  /home/lindenb/jeter.hic

I found that the following command:

./straw VC ~/jeter.hic 1:10:20 2:10:20 BP 100

reaches silently an EOF at https://github.com/aidenlab/straw/blob/master/C%2B%2B/straw.cpp#L266

you can check this by replacing the line 266 with

#define DEBUG(a) do { cerr << __LINE__ << ":" << a << endl; } while(0)

DEBUG("fin " << fin.tellg());
fin.read((char*)&nExpectedValues, sizeof(int));
DEBUG("nExpectedValues " << nExpectedValues << " eof ?" << fin.eof());
DEBUG("fin " << fin.tellg());
if ( (fin.rdstate() & std::ifstream::failbit ) != 0 ) DEBUG("failbit");
if ( (fin.rdstate() & std::ifstream::eofbit ) != 0 )  DEBUG("eofbit");
if ( (fin.rdstate() & std::ifstream::badbit ) != 0 ) DEBUG("badbit");

is it a bug in straw.cpp or is it a problem with the file inter.hic ?

thank you for your help.

nchernia commented 5 years ago

Thanks. It looks like this test file doesn't have normalization vectors. However this is a bad failure mode as you've pointed out, and it's also bad in Straw Python; we'll take a look at fixing it.

Note that 'BP 100' is not a resolution we would normally bin at (the highest is usually 5000) and that there won't be any reads in 1:10:20 as that's the telomere.

lindenb commented 5 years ago

@nchernia :+1:

Note that 'BP 100' is not a resolution we would normally bin at (the highest is usually 5000)

yes that was just for a test, I'm really new to this format.

I'm closing this issue. Thanks.