WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
948 stars 125 forks source link

memory leak #81

Closed richardhundt closed 1 year ago

richardhundt commented 6 years ago

Version: 1.1.7 Python 3.6

Hi, I'm seeing a pretty drastic memory leak using A.keys(...) only. Is this expected?

I'm doing the following:

A = ahocorasick.Automaton()
for doc in my_data:
    A.add_word(doc['my_field'], doc)

for s in my_strings:
    keys = list(A.keys(s))
    print(keys)

That's it. My data sets are pretty large, but I obviously know when I'm in the second loop, and memory keeps growing without bound.

WojciechMula commented 6 years ago

@richardhundt Thank you for reporting this.

WojciechMula commented 6 years ago

@richardhundt Again, thank you very much for the report and sorry for inconvenience. Hopefully, I fixed the leak (TBH, embarrassing bug). If you like, you may try the code directly from the master. Or wait until tomorrow (well, today's evening), I'll prepare a new release.

WojciechMula commented 6 years ago

Better wait for the new release, a build failed.

WojciechMula commented 6 years ago

OK, version 1.1.8 was released. @richardhundt please verify, if you can.

richardhundt commented 6 years ago

works perfectly now, thank you!

EdenAzulay commented 4 years ago

Hi, i'm sorry for opening such an old issue, but i'm currently experiencing the same issue. I'm using version 1.4.0 now and getting small steady memory leaks (after debugging with tracemalloc) on:

A = ahocorasick.Automaton() MyList = [...] for x in MyList: A.add_word(y, (y, z))

is there a chance this bug has returned

Thanks, Eden.

pombredanne commented 2 years ago

See https://github.com/WojciechMula/pyahocorasick/pull/166 where we have still a memory leak on unicode builds

Azzonith commented 2 years ago

Hello,

Can confirm memory leak issue exists in 1.4.4 and it might have something to do with pickling, tracemalloc output:

[ Top 10 ] /usr/lib/python3.8/multiprocessing/reduction.py:51: size=23.2 GiB, count=36957, average=659 KiB /usr/lib/python3.8/linecache.py:137: size=508 KiB, count=5133, average=101 B /usr/lib/python3.8/tracemalloc.py:65: size=60.1 KiB, count=962, average=64 B /usr/lib/python3.8/tracemalloc.py:185: size=42.2 KiB, count=900, average=48 B

:1: size=37.7 KiB, count=444, average=87 B :640: size=29.2 KiB, count=388, average=77 B /usr/local/lib/python3.8/dist-packages/kafka/protocol/types.py:193: size=28.4 KiB, count=638, average=46 B /usr/local/lib/python3.8/dist-packages/kafka/metrics/stats/sampled_stat.py:89: size=20.6 KiB, count=416, average=51 B /usr/lib/python3.8/copy.py:76: size=19.2 KiB, count=123, average=160 B /usr/local/lib/python3.8/dist-packages/kafka/cluster.py:281: size=18.4 KiB, count=100, average=188 B When I roll back to 1.1.8 the problem is not reproduced again.
pombredanne commented 1 year ago

@Azzonith I created https://github.com/WojciechMula/pyahocorasick/issues/183 to track your issue

pombredanne commented 1 year ago

@EdenAzulay You issue is tracked in https://github.com/WojciechMula/pyahocorasick/issues/135

Closing this one.