research best substring search approach

alexanderankin commented 1 year ago

This article lists some comparisons, and on page 10 it lists the comparison for searching in a 32 character alphabet space (which seems like the most relevant to this problem - maybe the rand8 one from a few pages prior) and it lists the Extended Backwards Oracle Matching (EBOM) or Forward Simplified Backwards Nondeterministic DAWG (directed acyclical word graph) Matching (FSBNDM) as the best candidates.

Although, I'm not sure why the paper comes to that conclusion. E.g. if the "Thathoo-Virmani-Sai-Balakrishnan-Sekar" algorithm is easier to implement, its only about 10% behind on longer strings, and significantly faster on shorter ones (or Franek-Jennings-Smyth), I think it would suffice for this purpose.

There is a bit of discussion around implementing a character matching algorithm in the Fast-Search article also.

alexanderankin commented 1 year ago

fs article has DOI:

10.1007/3-540-44867-5_4

alexanderankin commented 1 year ago

http://www-igm.univ-mlv.fr/~lecroq/string/index.html

alexanderankin / pi-search

research best substring search approach #1