Thriftpy / thriftpy2

Pure python approach of Apache Thrift.
MIT License
567 stars 90 forks source link

cannot pickle dump object from child idl file class #215

Closed truebit closed 5 months ago

truebit commented 1 year ago

add below code in the end of tests.test_hook.test_load test method:

    # load with sub_module name and it can be pickled
    list_item = ab_2.container.ListItem()
    assert list_item == pickle.loads(pickle.dumps(list_item))

fail log is:

> FAILED                                           [100%]
> tests/test_hook.py:31 (test_load)
> def test_load():
>         ab_1 = thriftpy2.load("addressbook.thrift")
>         ab_2 = thriftpy2.load("addressbook.thrift",
>                               module_name="addressbook_thrift")
>     
>         assert ab_1.__name__ == "addressbook"
>         assert ab_2.__name__ == "addressbook_thrift"
>     
>         # load without module_name can't do pickle
>         with pytest.raises(pickle.PicklingError):
>             pickle.dumps(ab_1.Person(name='Bob'))
>     
>         # load with module_name set and it can be pickled
>         person = ab_2.Person(name='Bob')
>         assert person == pickle.loads(pickle.dumps(person))
>     
>         # load with sub_module name and it can be pickled
>         list_item = ab_2.container.ListItem()
> \>       assert list_item == pickle.loads(pickle.dumps(list_item))
> E       _pickle.PicklingError: Can't pickle <class 'container.ListItem'>: attribute lookup ListItem on container failed
> 
> test_hook.py:50: PicklingError
aisk commented 1 year ago

This is not a bug, it's caused by pickle's behavior.

The error is raised from here: https://github.com/python/cpython/blob/main/Modules/_pickle.c#L3695 , as you can see, during the pickle.dumps process, pickle will try to import the module.

In your example, there is a missleading problem, the module ab_1.container is tests/container.thrift actually, and there is a test/container.py file exists beside the thrift file, and it's actually broken (missing the ListItem type), which we should fixed in the future (PR is welcomed!).

Delete the container.py, the actually error will be raised:

PicklingError: Can't pickle <class 'container.ListItem'>: import of module 'container' failed

We can figure out that pickle is complain that the module container is not exist. It's true because we're generating the module on the fly. And the generated module will not insert into the interpreter's import cache (sys.modules) if you load it without giving a name, and the module is a "wild module".

By load the module with thriftpy2.load("container.thrift", module_name = "container_thrift"), the generated module will be inserted into sys.modules, which will be used as a cache while importing. So the code will be worked as expected.

truebit commented 1 year ago

Because the container.thrift is included in the parent addressbook.thrift file, current load method for parent file would not be able to create a module name for the child thrift file. Your workaround would work, but thinking about many recursive includes in thrift files 😂

I think that would be convenient if we support auto-load of child thrift files as real module that pickle would recognize ;)

AN Long @.***>于2023年6月14日 周三22:01写道:

This is not a bug, it's caused by pickle's behavior.

The error is raised from here: https://github.com/python/cpython/blob/main/Modules/_pickle.c#L3695 , as you can see, during the pickle.dumps process, pickle will try to import the module.

In your example, there is a missleading problem, the module ab_1.container is tests/container.thrift actually, and there is a test/container.py file exists beside the thrift file, and it's actually broken (missing the ListItem type), which we should fixed in the future (PR is welcomed!).

Delete the container.py, the actually error will be raised:

PicklingError: Can't pickle <class 'container.ListItem'>: import of module 'container' failed

We can figure out that pickle is complain that the module container is not exist. It's true because we're generating the module on the fly. And the generated module will not insert into the interpreter's import cache ( sys.modules) if you load it without giving a name, and the module is a "wild module".

By load the module with thriftpy2.load("container.thrift", module_name = "container_thrift"), the generated module will be inserted into sys.modules, which will be used as a cache while importing. So the code will be worked as expected.

— Reply to this email directly, view it on GitHub https://github.com/Thriftpy/thriftpy2/issues/215#issuecomment-1591271387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFYJWUB7OEX7YCUFGU45G3XLG73FANCNFSM6AAAAAAZGKTO7Y . You are receiving this because you authored the thread.Message ID: @.***>

-- Br,

Sean Wang Blog: fclef.wordpress.com http://fclef.wordpress.com/about

aisk commented 1 year ago

I think we should add an option in the load function that enables automatic specification of the module name. For now, you can load the child thrift files beforehand as a workaround.

truebit commented 1 year ago

I tried to fix this issue, but it seems that only executing test_hook itself would pass the test; when with the whole test suite, test_hook would fail:

diff --git a/tests/test_hook.py b/tests/test_hook.py
index e5cc7c5..91e40d0 100644
--- a/tests/test_hook.py
+++ b/tests/test_hook.py
@@ -45,6 +45,10 @@ def test_load():
     person = ab_2.Person(name='Bob')
     assert person == pickle.loads(pickle.dumps(person))

+    # load with first level sub_module name and it can be pickled
+    list_item = ab_2.container.ListItem()
+    assert list_item == pickle.loads(pickle.dumps(list_item))
+

 def test_load_module():
     ab = thriftpy2.load_module("addressbook_thrift")
diff --git a/thriftpy2/parser/__init__.py b/thriftpy2/parser/__init__.py
index cb3e4b8..18ad1e4 100644
--- a/thriftpy2/parser/__init__.py
+++ b/thriftpy2/parser/__init__.py
@@ -12,6 +12,7 @@ from __future__ import absolute_import
 import os
 import sys
 import types
+import inspect

 from .parser import parse, parse_fp, incomplete_type, _cast
 from .exc import ThriftParserError
@@ -34,11 +35,17 @@ def load(path,
     """
     real_module = bool(module_name)
     thrift = parse(path, module_name, include_dirs=include_dirs,
-                   include_dir=include_dir)
+                   include_dir=include_dir, encoding=encoding)
     if incomplete_type:
         fill_incomplete_ttype(thrift, thrift)
     if real_module:
         sys.modules[module_name] = thrift
+        for item in dir(thrift):
+            element = getattr(thrift, item)
+            if inspect.ismodule(element):
+                if item not in sys.modules:
+                    sys.modules[item] = element
+
     return thrift
helpau commented 1 year ago

I tried to fix it in #219

aisk commented 9 months ago

Sorry, I've lost the context. does #219 fixed the problem?