caleb531 / automata

A Python library for simulating finite automata, pushdown automata, and Turing machines
https://caleb531.github.io/automata/
MIT License
349 stars 64 forks source link

Aho-Corasick Algorithm #162

Closed eliotwrobson closed 8 months ago

eliotwrobson commented 1 year ago

See title, implement the Aho-Corasick algorithm for construction of a DFA which recognizes all strings which contain a substring from a given dictionary. This is an extension of the existing from_substring method that uses the KMP construction algorithm (and can probably replace that method if implemented, as the existing method is just a special case of the new one).

References:

cc @Tagl as you originally implemented the KMP algorithm. I'm not sure if there's a simple way to extend the existing implementation to the more general AC algorithm.

Tagl commented 1 year ago

I have a C++ implementation that has been fieldtested through programming contests, I can port and adapt to the library. Currently at EGOI but I promise to look into implementing when I'm back sometime soon after the 24th of July.

eliotwrobson commented 1 year ago

That would be awesome, thanks! Don't worry about the timing, with the v8 release I'm sure we'll have a few post-release items that will make it into a minor version.

eliotwrobson commented 1 year ago

@Tagl have you had a chance to take a crack at this yet? No worries if not, just wanted to check in 😃

Tagl commented 1 year ago

I've reviewed the old implementation and refreshed my memory on how the algorithm works but haven't started writing anything so far.

Tagl commented 1 year ago

I'm planning on doing it this coming weekend.

Tagl commented 1 year ago

cc @Tagl as you originally implemented the KMP algorithm. I'm not sure if there's a simple way to extend the existing implementation to the more general AC algorithm.

Regarding this, I will probably start with a fresh rewrite, then try to adapt it to the old interface.

Tagl commented 1 year ago

Done writing it up, seems to work, just some cleanup and tests left before making a PR

Tagl commented 1 year ago

Sidenote: we should extend it for containing one of a set of subsequences using NFAs. It's a rather simple extension after all.

eliotwrobson commented 8 months ago

Closing since the PR was merged to develop.