BIO-DIKU / SeqScan

Pattern matching in biological sequences
GNU General Public License v2.0
4 stars 0 forks source link

bitset test argument out of range #93

Closed maasha closed 8 years ago

maasha commented 8 years ago
maasha@edna:~/install/src/SeqScan/benchmark$ seqscan -m 10 -P pat_group_neg/seqscan_aa.pat -V data/Aurpu2p4.faa
seqscan_123.2.23

seqscan -m 10 -P pat_group_neg/seqscan_aa.pat -V data/Aurpu2p4.faa

Options:
  help:           false
  pattern:
  pattern_file:   pat_group_neg/seqscan_aa.pat
  complement:     forward
  direction:      forward
  start:          0
  end:            0
  threads:        1
  score_encoding: Phred33
  score_min:      25
  ambiguate:      false
  match_type:     10
  match_file:
  output:
  overlap:        false
  filter:
  version:        false
  verbose:        true

Patterns:
  [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN] [^DEN]
Aurpu2p4_000001 +   70,1,0,Q;71,1,0,S;72,1,0,R;73,1,0,P;74,1,0,P;75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q 20
Aurpu2p4_000001 +   71,1,0,S;72,1,0,R;73,1,0,P;74,1,0,P;75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P 20
Aurpu2p4_000001 +   72,1,0,R;73,1,0,P;74,1,0,P;75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L 20
Aurpu2p4_000001 +   73,1,0,P;74,1,0,P;75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R 20
Aurpu2p4_000001 +   74,1,0,P;75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q 20
Aurpu2p4_000001 +   75,1,0,L;76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H 20
Aurpu2p4_000001 +   76,1,0,L;77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L 20
Aurpu2p4_000001 +   77,1,0,C;78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A 20
Aurpu2p4_000001 +   78,1,0,R;79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L 20
Aurpu2p4_000001 +   79,1,0,R;80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L 20
Aurpu2p4_000001 +   80,1,0,S;81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P 20
Aurpu2p4_000001 +   81,1,0,I;82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P    20
Aurpu2p4_000001 +   82,1,0,S;83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S   20
Aurpu2p4_000001 +   83,1,0,Q;84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T  20
Aurpu2p4_000001 +   84,1,0,L;85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S 20
Aurpu2p4_000001 +   85,1,0,A;86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R    20
Aurpu2p4_000001 +   86,1,0,R;87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C   20
Aurpu2p4_000001 +   87,1,0,L;88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S  20
Aurpu2p4_000001 +   88,1,0,A;89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q 20
Aurpu2p4_000001 +   89,1,0,Q;90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y    20
Aurpu2p4_000001 +   90,1,0,P;91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S   20
Aurpu2p4_000001 +   91,1,0,L;92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V  20
Aurpu2p4_000001 +   92,1,0,R;93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L 20
Aurpu2p4_000001 +   93,1,0,Q;94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L;112,1,0,S    20
Aurpu2p4_000001 +   94,1,0,H;95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L;112,1,0,S;113,1,0,R   20
Aurpu2p4_000001 +   95,1,0,L;96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L;112,1,0,S;113,1,0,R;114,1,0,S  20
Aurpu2p4_000001 +   96,1,0,A;97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L;112,1,0,S;113,1,0,R;114,1,0,S;115,1,0,H 20
Aurpu2p4_000001 +   97,1,0,L;98,1,0,L;99,1,0,P;100,1,0,P;101,1,0,S;102,1,0,T;103,1,0,S;104,1,0,R;105,1,0,C;106,1,0,S;107,1,0,Q;108,1,0,Y;109,1,0,S;110,1,0,V;111,1,0,L;112,1,0,S;113,1,0,R;114,1,0,S;115,1,0,H;116,1,0,L    20
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: bitset test argument out of range
Abort trap: 6
maasha commented 8 years ago

Hm, no weird chars in the data file:

maasha@edna:~/install/src/SeqScan/benchmark/data$ grep -v '^>' Aurpu2p4.faa | perl -ne 'print join("\n", split("", $_)), "\n"' | sort -u

A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
X
Y
maasha commented 8 years ago

OK, there is some overflow in the hash function:

The offending index value in res_matches.cc:61 is

index: 18446744073709543748

maasha commented 8 years ago

Dumping the chars being compared indicates that these are non-printables (despite the data test?)

char a: �
char b: �
Hash  : 18446744073709527108
index: 18446744073709527108
maasha commented 8 years ago
char a: �
char b: D
int a: -28
int b: 68
Hash  : 18446744073709544516
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: bitset test argument out of range
Process 35492 stopped
* thread #1: tid = 0x28fc75, 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8f3a1286:  jae    0x7fff8f3a1290            ; __pthread_kill + 20
   0x7fff8f3a1288:  movq   %rax, %rdi
   0x7fff8f3a128b:  jmp    0x7fff8f39cc53            ; cerror_nocancel
   0x7fff8f3a1290:  retq
(lldb) bt
* thread #1: tid = 0x28fc75, 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8f2269f9 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff8c1e29b3 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff94f14a21 libc++abi.dylib`abort_message + 257
    frame #4: 0x00007fff94f3c9b9 libc++abi.dylib`default_terminate_handler() + 243
    frame #5: 0x00007fff87dc77eb libobjc.A.dylib`_objc_terminate() + 124
    frame #6: 0x00007fff94f3a0a1 libc++abi.dylib`std::__terminate(void (*)()) + 8
    frame #7: 0x00007fff94f39b30 libc++abi.dylib`__cxa_throw + 121
    frame #8: 0x000000010001d197 seqscan`std::__1::bitset<65536ul>::test(this=0x0000000101013820, __pos=18446744073709544516) const + 199 at bitset:989
    frame #9: 0x000000010001c16d seqscan`ResMatcher::is_set(this=0x0000000101013820, index=18446744073709544516) const + 29 at res_matcher.cc:66
    frame #10: 0x000000010001c0fb seqscan`ResMatcher::Match(this=0x0000000101013820, a='�', b='D') const + 555 at res_matcher.cc:58
    frame #11: 0x000000010002d960 seqscan`GroupUnit::FindNoMatchAtPos(this=0x0000000101013800) + 1168 at group_unit.cc:65
    frame #12: 0x000000010002d46b seqscan`GroupUnit::FindMatch(this=0x0000000101013800) + 59 at group_unit.cc:46
    frame #13: 0x000000010002993d seqscan`CompositeUnit::FindMatch(this=0x0000000101008e00) + 173 at composite_unit.cc:76
    frame #14: 0x0000000100069ad3 seqscan`main(argc=7, argv=0x00007fff5fbff560) + 6099 at main.cc:76
    frame #15: 0x00007fff8bbe15c9 libdyld.dylib`start + 1
maasha commented 8 years ago

This triggers the error:

seqscan -m 10 -p '[^DEN] [^DEN] [^DEN]' -V data/Aurpu2p4.faa
RasmusFonseca commented 8 years ago

Pretty sure this is a bug in CompositeUnit. It doesnt check how close to the end it is.

maasha commented 8 years ago

It is unfortunate that the error appears in ResMatcher. How could that have been avoided?

RasmusFonseca commented 8 years ago

A combination of the last two. The documentation of PatternUnit::Initialize clearly says:

Initialize the pattern unit so a check can be performed for matches not extending 
beyond max_pos. If stay_at_pos is set then FindMatch should only proceed to 
matches whose start-position is at pos. If not, the search should proceed until 
any match can be found.

.. so its not a design problem. Its just that CompositeUnit doesn't satisfy that contract and we haven't added a unit test for the 'matches extending beyond max_pos' yet.

Some languages would complain the second you attempt to iterate beyond the end of a string, but that requires builtin checks that slow everything down.

maasha commented 8 years ago

Is this fixed or not?

RasmusFonseca commented 8 years ago

Yep, should be fixed.