Closed slavaGanzin closed 5 years ago
Thanks for the report.
Does it fail if replace
_pickle.dump(A, open('aho', 'wb'))
with
with open('aho', 'wb') as f:
_pickle.dump(A, f)
?
@serhiy-storchaka Why it might make any difference?
If the file was not closed, it might be not all data was written.
Thanks, I missed that.
I've tried, but still get the error
@findmyway Could you please check the latests version of module?
@WojciechMula
Excellent! It works now!
Thank you
Hey I'm still getting this bug
pip freeze | grep pyahcorasick
> pyahocorasick==1.1.7.dev1
(same problem for 1.1.6 which is on pip)
AUTO = ahocorasick.Automaton()
for key,value in list(final_dic.items())[0:65]:
AUTO.add_word(key,value)
AUTO.make_automaton()
import _pickle
with open(mypath,'wb') as f:
_pickle.dump(AUTO,f)
with open(mypath,'rb') as f:
s = _pickle.load(f)
> ValueError: binary data truncated (2)
AUTO = ahocorasick.Automaton()
for key,value in list(final_dic.items())[0:63]:
AUTO.add_word(key,value)
AUTO.make_automaton()
import _pickle
with open(mypath,'wb') as f:
_pickle.dump(AUTO,f)
with open(mypath,'rb') as f:
s = _pickle.load(f)
> Parsed count 559
Anything I can try for a temporary fix? I saw your post that your stretched thin solving pickle issues just hoping to get some work around for now. Thanks for this lib, awesome work.
Python 3.6.3
Linux 4.9.65-1-MANJARO #1 SMP PREEMPT Fri Nov 24 10:42:19 UTC 2017 x86_64 GNU/Linux
@scottwthompson I have no idea how to fix it right now. :(
@WojciechMula No problem, great work. My work around was is to just recreate the automaton from data instead each time, for my purposes it's not too slow.
@scottwthompson why do you import _pickle
? at least on Python2 this is how this works nicely: https://github.com/nexB/scancode-toolkit/blob/5dcb56815f0fba1e74d7a2314a0c98d0100eb295/src/licensedcode/index.py#L637
Here multiple automatons that are instance attributes of my LicenseIndex class are pickled without any problem: https://github.com/nexB/scancode-toolkit/blob/5dcb56815f0fba1e74d7a2314a0c98d0100eb295/src/licensedcode/index.py#L209
FWIW the automatons are created and used there: https://github.com/nexB/scancode-toolkit/blob/a9083191a04f62c05d588a22fa8f4839eeffc79d/src/licensedcode/match_aho.py
Version 1.1.11 fixes the problem
Hello I don't know is this connected to #50, so created a new issue:
This works ok:
And this fails constantly: