Closed maasha closed 8 years ago
Hm, no weird chars in the data file:
maasha@edna:~/install/src/SeqScan/benchmark/data$ grep -v '^>' Aurpu2p4.faa | perl -ne 'print join("\n", split("", $_)), "\n"' | sort -u
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
X
Y
OK, there is some overflow in the hash function:
The offending index value in res_matches.cc:61 is
index: 18446744073709543748
Dumping the chars being compared indicates that these are non-printables (despite the data test?)
char a: �
char b: �
Hash : 18446744073709527108
index: 18446744073709527108
char a: �
char b: D
int a: -28
int b: 68
Hash : 18446744073709544516
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: bitset test argument out of range
Process 35492 stopped
* thread #1: tid = 0x28fc75, 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8f3a1286: jae 0x7fff8f3a1290 ; __pthread_kill + 20
0x7fff8f3a1288: movq %rax, %rdi
0x7fff8f3a128b: jmp 0x7fff8f39cc53 ; cerror_nocancel
0x7fff8f3a1290: retq
(lldb) bt
* thread #1: tid = 0x28fc75, 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff8f3a1286 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8f2269f9 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff8c1e29b3 libsystem_c.dylib`abort + 129
frame #3: 0x00007fff94f14a21 libc++abi.dylib`abort_message + 257
frame #4: 0x00007fff94f3c9b9 libc++abi.dylib`default_terminate_handler() + 243
frame #5: 0x00007fff87dc77eb libobjc.A.dylib`_objc_terminate() + 124
frame #6: 0x00007fff94f3a0a1 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #7: 0x00007fff94f39b30 libc++abi.dylib`__cxa_throw + 121
frame #8: 0x000000010001d197 seqscan`std::__1::bitset<65536ul>::test(this=0x0000000101013820, __pos=18446744073709544516) const + 199 at bitset:989
frame #9: 0x000000010001c16d seqscan`ResMatcher::is_set(this=0x0000000101013820, index=18446744073709544516) const + 29 at res_matcher.cc:66
frame #10: 0x000000010001c0fb seqscan`ResMatcher::Match(this=0x0000000101013820, a='�', b='D') const + 555 at res_matcher.cc:58
frame #11: 0x000000010002d960 seqscan`GroupUnit::FindNoMatchAtPos(this=0x0000000101013800) + 1168 at group_unit.cc:65
frame #12: 0x000000010002d46b seqscan`GroupUnit::FindMatch(this=0x0000000101013800) + 59 at group_unit.cc:46
frame #13: 0x000000010002993d seqscan`CompositeUnit::FindMatch(this=0x0000000101008e00) + 173 at composite_unit.cc:76
frame #14: 0x0000000100069ad3 seqscan`main(argc=7, argv=0x00007fff5fbff560) + 6099 at main.cc:76
frame #15: 0x00007fff8bbe15c9 libdyld.dylib`start + 1
This triggers the error:
seqscan -m 10 -p '[^DEN] [^DEN] [^DEN]' -V data/Aurpu2p4.faa
Pretty sure this is a bug in CompositeUnit. It doesnt check how close to the end it is.
It is unfortunate that the error appears in ResMatcher. How could that have been avoided?
A combination of the last two. The documentation of PatternUnit::Initialize clearly says:
Initialize the pattern unit so a check can be performed for matches not extending
beyond max_pos. If stay_at_pos is set then FindMatch should only proceed to
matches whose start-position is at pos. If not, the search should proceed until
any match can be found.
.. so its not a design problem. Its just that CompositeUnit doesn't satisfy that contract and we haven't added a unit test for the 'matches extending beyond max_pos' yet.
Some languages would complain the second you attempt to iterate beyond the end of a string, but that requires builtin checks that slow everything down.
Is this fixed or not?
Yep, should be fixed.