WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
914 stars 122 forks source link

How to calculate automaton memory footprint? #177

Closed abcdenis closed 1 year ago

abcdenis commented 1 year ago

Hi, is there a way to determine prepared Automaton's memory footprint? It could be helpful for using this in limited size cache. Thank you.

pombredanne commented 1 year ago

@abcdenis the get_stats() method returns a dict https://pyahocorasick.readthedocs.io/en/latest/#other-automaton-methods :

Return a dictionary containing Automaton statistics. Note that the real size occupied by the data structure could be larger because of internal memory fragmentation that can occur in a memory manager.

See https://pyahocorasick.readthedocs.io/en/latest/#get-stats-dict

>>> import ahocorasick
>>> A = ahocorasick.Automaton()
>>> A.add_word("he", None)
True
>>> A.add_word("her", None)
True
>>> A.add_word("hers", None)
True
>>> A.get_stats()
{'nodes_count': 5, 'words_count': 3, 'longest_word': 4, 'links_count': 4, 'sizeof_node': 40, 'total_size': 232}

/hth