reduce memory usage by over 50% by refactoring the non-contiguous NFA representation

BurntSushi commented 1 year ago

In effect, we switch to using a linked list to represent transitions for each state in a non-contiguous NFA. (The commit messages have more details.) This substantially reduces memory usage and means that overall peak memory usage is reduced as well when building things like a contiguous NFA.

I'll show a couple examples. This first one comes from [this issue](). There's 300,000 patterns and each pattern is comprised of 4 random words:

$ time aho-corasick-debug_2023-08-10-before-noncontiguous-refactor /tmp/300000.pats /tmp/haystack.small --match-kind standard --kind noncontiguous
pattern read time: 2.690752ms
automaton build time: 2.135668194s
automaton heap usage: 561774264 bytes
match count: 1
count time: 3.538399241s

real    5.761
user    5.551
sys     0.207
maxmem  1006 MB
faults  0

$ time aho-corasick-debug /tmp/300000.pats /tmp/haystack.small --match-kind standard --kind noncontiguous
pattern read time: 2.672161ms
automaton build time: 1.439117348s
automaton heap usage: 259695786 bytes
match count: 1
count time: 2.684963216s

real    4.133
user    4.081
sys     0.050
maxmem  390 MB
faults  0

This gives us a 50% reduction in the heap used by the non-contigous NFA and an even greater reduction in peak memory usage. Plus, search times get faster!

The next test is one of my standard go-tos that builds an automaton from 1,000,000 randomly selected Wikipedia article titles:

$ time aho-corasick-debug_2023-08-10-before-noncontiguous-refactor ~/data/benchsuite/texts/wikipedia/titles.1000000 /tmp/haystack.small --match-kind standard --kind noncontiguous
pattern read time: 7.727681ms
automaton build time: 3.086064908s
automaton heap usage: 832984632 bytes
match count: 1117113
count time: 1.250706555s

real    4.503
user    4.287
sys     0.213
maxmem  1421 MB
faults  0

$ time aho-corasick-debug ~/data/benchsuite/texts/wikipedia/titles.1000000 /tmp/haystack.small --match-kind standard --kind noncontiguous
pattern read time: 7.849024ms
automaton build time: 3.212135732s
automaton heap usage: 413686502 bytes
match count: 1117113
count time: 1.209295572s

real    4.437
user    4.358
sys     0.077
maxmem  538 MB
faults  0

Search times remain roughly the same, but heap used by the NFA and peak memory usage decrease substantially, in similar ratios as the previous test.

Much thanks to https://github.com/G-Research/ahocorasick_rs/issues/62 and @erikcw for the kick in the ass to explore this idea.

itamarst commented 1 year ago

Woot

BurntSushi commented 1 year ago

This PR is on crates.io in aho-corasick 1.0.4.

BurntSushi / aho-corasick

reduce memory usage by over 50% by refactoring the non-contiguous NFA representation #121