WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
927 stars 122 forks source link

pickling error of signal SIGSEGV #97

Closed leonqli closed 2 years ago

leonqli commented 5 years ago

terminated by signal SIGSEGV (Address boundary error)

WojciechMula commented 5 years ago

I'm sorry you had problems. Could you please provide more details?

leonqli commented 5 years ago

This code is listed below. The file (t.txt) is a file containing 28 millions lines of text.

import pickle
import ahocorasick
A = ahocorasick.Automaton()
with open("t.txt") as f:
    for line in f:
        line = line.strip("\n")
        A.add_word(line, 1)
print("loaded")
with open ("t.pkl", 'wb') as f:
    print("begin to pickle")
    pickle.dump(A, f)
    print("finish")

The output:

loaded
begin to pickle
fish: Job 1, 

and t.pkl is empty.

WojciechMula commented 5 years ago

@leonqli 28 million of lines means huge data. I'm afraid you hit #50, which is unresolved yet. It's almost fixed, but I need a few days to release it.

leonqli commented 5 years ago

Thanks! Please keep up with the great work!

WojciechMula commented 5 years ago

@leonqli If you can, please check the latest version. The bug I mentioned was fixed.

WojciechMula commented 5 years ago

@leonqli ping? did you have a chance to test a newer release?

leonqli commented 5 years ago

Just saw your message. I had a test just now. Unfortunately, it is still the same (empty t.pkl) ...

WojciechMula commented 5 years ago

Thank you for rechecking. Is it possible to obtain data you use? I'd like to reproduce the problem.

pombredanne commented 2 years ago

@leonqli gentle ping... this has been opened a couple years now without a reply so I am closing this. Please reopen with a reproducible test case if you still have the issue with the latest version.