WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
927 stars 122 forks source link

memory leak #116

Closed dapeng2018 closed 1 year ago

dapeng2018 commented 5 years ago

Version: 1.4 Python 2.7.15

class TEST():

def __init__(self, input_filename):
    self.ac = ahocorasick.Automaton()
    n_word = 0
    with open(input_filename) as f:
        for text in f:
            n_word += 1
            word = text.strip()
            self.ac.add_word(word, (n_word, word))
    self.ac.make_automaton()

def test_sent(self, text):
    lines = text.strip().split('\t')
    sent = lines[0]
    res_relevant = []
    for item in self.ac.iter(sent):
        res_relevant.append(item[1][1])

if name == 'main': t = TEST(sys.argv[1]) before = t.get_memory_usage() n_lines = 0 with open(sys.argv[2]) as f: for text in f: n_lines += 1 t.test_sent(text.strip())

The input data is very large, and memory keeps growing.

pombredanne commented 2 years ago

@dapeng2018 do you still use Python 2? or did you switch to Python 3? Do you have the issue there too with 1.4.2?

pombredanne commented 1 year ago

@dapeng2018 I am closing for now, for lack of activity