GZHoffie / approximate-string-matching

A repository of experimental code and benchmark for greedy/dynamic programming approximate string matching algorithms.
MIT License
2 stars 0 forks source link

Approximate-String-Matching

Benchmark

Simulated Data

Error rate: 0.05 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 36.220 s
=> LEAP             | 1.550 s
=> Greedy           | 0.850 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 99.757 %
=> Greedy           | 92.975 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 97.512 %

Error rate: 0.10 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 34.260 s
=> LEAP             | 2.890 s
=> Greedy           | 1.210 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 98.066 %
=> Greedy           | 78.020 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 94.213 %

Error rate: 0.15 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 32.330 s
=> LEAP             | 3.850 s
=> Greedy           | 1.640 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 93.424 %
=> Greedy           | 57.939 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 90.418 %

Error rate: 0.20 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 31.550 s
=> LEAP             | 4.470 s
=> Greedy           | 1.730 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 88.579 %
=> Greedy           | 46.023 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 88.289 %

Real Data

Dataset generated by illumina reads SRR611076.

Using AVX2,

[Time]
=> Needleman-Wunsch | 108.390 s
=> LEAP             | 4.440 s
=> Greedy           | 3.460 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 89.524 %
=> Greedy           | 92.674 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 97.829 %

Using SSE,

[Time]
=> Needleman-Wunsch | 106.850 s
=> LEAP             | 4.660 s
=> Greedy           | 2.610 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 89.524 %
=> Greedy           | 89.725 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 96.843 %