WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
927 stars 122 forks source link

Introduce new trie representation #107

Closed WojciechMula closed 5 years ago

WojciechMula commented 5 years ago

Python 3.7.2 from Debian

Memory usage for 250k words is sligtly better: master = 124407808 (118.64 MB), new representation = 124260352 (118.50 MB).

Benchmark comparison (benchmarks/benchmark3.py)

----------------------------------------------------------------------------------------
                                        |  master     | new representation |  speed-up
----------------------------------------------------------------------------------------
Generating data (1000000 words)         |  33.440 s   |         33.930 s   |    ---
Add words                               |   2.764 s   |          1.689 s   |  1.63 x 
Building automaton                      |  24.739 s   |         10.921 s   |  2.26 x
Look up                                 |   4.898 s   |          2.474 s   |  1.97 x
Search                                  |   1.480 s   |          0.544 s   |  2.72 x