antonioaversa / MoreStructures

Data structures and algorithms for .NET6.
MIT License
5 stars 0 forks source link

Pattern matching with Partial Suffix Arrays #83

Open antonioaversa opened 2 years ago

antonioaversa commented 2 years ago

Implement text pattern matching with Partial Suffix Arrays, Burrows-Wheeler Transform and its sorted version.

Suffix Arrays are a space-efficient alternative to Suffix Trees, due to their Θ(n) with c=1 space used, when the size of an index can be considered constant w.r.t. the size of the input strings.

There are, however, cases where even the linear Space Complexity of a Suffix Array is considered too high, and a smaller data structure has to be stored, potentially at the cost of the algorithm runtime.

Text pattern matching with Suffix Arrays can also be done with partial structures: for example, only the elements at index i of the Suffix Array, such that i % k == 0, where k is a constant, can be stored; the other ones discarded or even not calculated.

Pattern matching with Partial Suffix Arrays can be implemented using the Last-First property of BWT and Sorted BWT. The property is used to iteratively identify the char preceding the found char, until a char at a position i is found, such that the i-th element of the Suffix Array is known (e.g. i % k == 0).

antonioaversa commented 2 years ago

Reopening as https://github.com/antonioaversa/MoreStructures/issues/83 wrongly attached to https://github.com/antonioaversa/MoreStructures/pull/85