Closed ecolss closed 6 years ago
Thank you for the report. Which version of the library do you use? Windows/Linux? Python 2/Python 3?
Tested on Mac and Linux, Py3.6/2.7.
I found that if the keyword set contains only a few words, it works as expected. But if contains half a million words, it failed to return 'go'.
@ecolss I tried to reproduce the bug (https://github.com/WojciechMula/pyahocorasick/blob/master/unresolved_bugs/bug_56.py), but it works correctly for both py3 and py2.
I loaded all words from an English dictionary and added to the automaton the words and their subwords. Here is output from the script I linked:
Adding words...
742172 words added
Building an automaton...
Test
(0, 'p')
(1, 'po')
(1, 'o')
(2, 'pok')
(2, 'ok')
(2, 'k')
(3, 'poke')
(3, 'oke')
(3, 'ke')
(3, 'e')
(4, 'keg')
(4, 'eg')
(4, 'g')
(5, 'ego')
(5, 'go')
(5, 'o')
Could you please provide the list of words you use? Or a script which produce it?
In [91]: ac.clear()
In [92]: ac.add_word('poke', 'poke')
Out[92]: True
In [93]: ac.add_word('go', 'go')
Out[93]: True
In [94]: ac.add_word('pokegois', 'pokegois')
Out[94]: True
In [95]: ac.make_automaton()
In [97]: ac.find_all('pokego', fn)
(3, 'poke')
@ecolss Thank you! Now I'm tackling the problem.
Changes work well here on Windows Python 3.5 64 bit.
Build an AC automaton on top of tens of thousands of keywords, and then I did the following search:
I expect find_all to return match for 'go', but it didn't.