WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
927 stars 122 forks source link

save/load silently discards tuple data #130

Closed pquentin closed 3 years ago

pquentin commented 4 years ago

How to reproduce

Run the following script:

import pickle
import ahocorasick

A = ahocorasick.Automaton(ahocorasick.STORE_ANY)
A.add_word("word", (1, 2))
A.make_automaton()
print(list(A.iter_long("find word")))

A.save("/tmp/a", pickle.dumps)
B = ahocorasick.load("/tmp/a", pickle.loads)
print(list(B.iter_long("find word")))

Expected result

The documentation says that save/load is supported with STORE_ANY, so I would expect to see the same list twice:

[(8, (1, 2))]
[(8, (1, 2))]

An exception would be fine too.

Actual result

The tuple is silently cropped:

[(8, (1, 2))]
[(8, 1)]
zhu commented 3 years ago

The tuple is not cropped, but unpacked. It will raise exception when the tuple has more elements or the second element is larger than 5 or not a number.

see https://bugs.python.org/issue28977

PyObject_CallFunction(func, "O", arg) calls func(arg), or func(*arg) if arg is a tuple

https://github.com/WojciechMula/pyahocorasick/blob/90b3079d1465b49849b128f697bbbaefc57e5122/src/custompickle/save/automaton_save.c#L108

maybe change to PyObject_CallFunctionObjArgs(func, arg, NULL) will solve this issue.

WojciechMula commented 3 years ago

Thank you very much @zhu for the fix. It's highly appreciate.

I have just released 1.4.2. @pquentin, please check if this solves your problem.

WojciechMula commented 3 years ago

Reopening. I misclicked. :) Sorry.

pquentin commented 3 years ago

I won't be able to test this soon as I no longer use this library for different reasons If @zhu fixed it, then it's fixed! Closing, thanks everyone