Closed marco-neumann-by closed 3 years ago
Could you please edit the bug report to include the full traceback?
Also is this problem happening with the current master branch of cloudpickle?
I believe this was fixed by #409 as I cannot reproduce anymore. We still need to release though.
I still get the same error using the cloudpickle version from master
in Python 3.8.5:
The fix from #409 only seems to target Python version < 3.7.
Edited to use cloudpickle from master
This issue should be reopened.
The difference between environments and likely why @ogrisel was unable to reproduce this is because pydantic can be installed with or without Cython support. The Cython version of Pydantic is unsurprisingly significantly faster than the pure-Python version and is also the default install (at least for platforms for which wheels exist).
Here are two examples using virtualenv that should be reproducible, using the same script as @marco-neumann-by defined initially:
# example.py
import cloudpickle
import pydantic
import pickle
class Bar(pydantic.BaseModel):
a: int
pickle.loads(pickle.dumps(Bar(a=1))) # This works well
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below
Note that the --no-binary
pydantic tells pip
to install without any Cython files.
virtualenv .venv
source ./.venv/bin/activate
pip install git+https://github.com/cloudpipe/cloudpickle pydantic --no-binary pydantic
Here you can tell that there are no cython files:
> ls ./.venv/lib/python3.8/site-packages/pydantic/
__init__.py datetime_parse.py json.py tools.py
__pycache__ decorator.py main.py types.py
_hypothesis_plugin.py env_settings.py mypy.py typing.py
annotated_types.py error_wrappers.py networks.py utils.py
class_validators.py errors.py parse.py validators.py
color.py fields.py py.typed version.py
dataclasses.py generics.py schema.py
And the example passes without issue
> python example.py
> echo $?
0
Now we install pydantic without use of --no-binary pydantic
.
deactivate
rm -rf .venv
virtualenv .venv
source ./.venv/bin/activate
pip install git+https://github.com/cloudpipe/cloudpickle pydantic
Now you can see that there are built C libraries included with Pydantic:
> ls ./.venv/lib/python3.8/site-packages/pydantic/
__init__.cpython-38-darwin.so json.cpython-38-darwin.so
__init__.py json.py
__pycache__ main.cpython-38-darwin.so
_hypothesis_plugin.cpython-38-darwin.so main.py
_hypothesis_plugin.py mypy.cpython-38-darwin.so
annotated_types.cpython-38-darwin.so mypy.py
annotated_types.py networks.cpython-38-darwin.so
class_validators.cpython-38-darwin.so networks.py
class_validators.py parse.cpython-38-darwin.so
color.cpython-38-darwin.so parse.py
color.py py.typed
dataclasses.cpython-38-darwin.so schema.cpython-38-darwin.so
dataclasses.py schema.py
datetime_parse.cpython-38-darwin.so tools.cpython-38-darwin.so
datetime_parse.py tools.py
decorator.cpython-38-darwin.so types.cpython-38-darwin.so
decorator.py types.py
env_settings.cpython-38-darwin.so typing.cpython-38-darwin.so
env_settings.py typing.py
error_wrappers.cpython-38-darwin.so utils.cpython-38-darwin.so
error_wrappers.py utils.py
errors.cpython-38-darwin.so validators.cpython-38-darwin.so
errors.py validators.py
fields.cpython-38-darwin.so version.cpython-38-darwin.so
fields.py version.py
generics.py
And running our example again, we can see that it fails:
> python example.py
Traceback (most recent call last):
File "example.py", line 9, in <module>
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below
File "/Users/kbarron/tmp/.venv/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/Users/kbarron/tmp/.venv/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
_pickle.PicklingError: Can't pickle <cyfunction int_validator at 0x101cf62b0>: attribute lookup lambda12 on pydantic.validators failed
Also note that the issue does NOT appear when the file is not
__main__
, for example:
I can also reproduce this, however:
# example.py
import cloudpickle
import pickle
from models import Bar
pickle.loads(pickle.dumps(Bar(a=1))) # This works well
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below
# models.py
import pydantic
class Bar(pydantic.BaseModel):
a: int
This works fine, so a quick workaround is to always define Pydantic models in a separate file.
I'm still having this issue in cloudpickle 2.0.0 it is only working with non-cython Pydantic And my Pydantic models declared in a separated file
@ogrisel I am also still seeing this issue in 2.0.0. The workaround in https://github.com/cloudpipe/cloudpickle/issues/408#issuecomment-933760919 works for me, but I believe this issue should be reopened.
I have this issue with pydantic and pyspark.
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/pandas/map_ops.py:91: in mapInPandas
udf_column = udf(*[self[col] for col in self.columns])
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:276: in wrapper
return self(*args)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:249: in __call__
judf = self._judf
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:215: in _judf
self._judf_placeholder = self._create_judf(self.func)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:224: in _create_judf
wrapped_func = _wrap_function(sc, func, self.returnType)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:50: in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/rdd.py:3345: in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/serializers.py:458: in dumps
return cloudpickle.dumps(obj, pickle_protocol)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/cloudpickle/cloudpickle_fast.py:73: in dumps
cp.dump(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyspark.cloudpickle.cloudpickle_fast.CloudPickler object at 0x7ff5f0410700>
obj = (<function test_graphlet_etl.<locals>.horror_to_movie at 0x7ff5d0e81480>, StructType([StructField('entity_id', StringT...ld('length', LongType(), False), StructField('gross', LongType(), False), StructField('rating', StringType(), False)]))
def dump(self, obj):
try:
> return Pickler.dump(self, obj)
E _pickle.PicklingError: Can't pickle <cyfunction str_validator at 0x7ff5b0461220>: it's not the same object as pydantic.validators.str_validator
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/cloudpickle/cloudpickle_fast.py:602: PicklingError
I've just been bitten by this. @ogrisel, can we reopen this issue? The workaround is not an option if you are defining your objects inside a jupyter notebook.
@brettc as a workaround, you can define custom serializers to pack and unpack pydantic objects. This might help your use case.
@simon-mo thanks for the tip -- this looks very promising! The error occurs for me when I'm using dask, so I guess you had the same issues in ray. (BTW, ray is amazing. I chose dask for this job because ray seemed like overkill).
I'm still struggling to find a workaround for this issue. My code is not directly defining any pydantic types (although it is used by dependent libraries).
Is there a version upgrade/downgrade that might be the cause? Unclear on where the actual issue is occuring. In my case it looks to be in the chain of uvicorn and kserve:
Traceback (most recent call last):
File "/.asdf/installs/python/3.9.11/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/.asdf/installs/python/3.9.11/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/Library/Caches/pypoetry/virtualenvs/truss-FUoNelHr-py3.9/lib/python3.9/site-packages/kserve/model_server.py", line 275, in servers_task
await asyncio.gather(*servers)
File "/Library/Caches/pypoetry/virtualenvs/truss-FUoNelHr-py3.9/lib/python3.9/site-packages/kserve/model_server.py", line 269, in serve
server.start()
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction str_validator at 0x16b57c790>: it's not the same object as pydantic.validators.str_validator
This still happens. I have to define pydantic models in another file, otherwise I get this error. Even in a simple file where I define a pydantic param class and a Ray actor with a single method, this happens. Using the latest ray, pydantic, etc.
I agree this issue still exists and I believe it is actually fixed in pydantic 2.5 (see issue and PR) if you run your script with Python. An issue still exists inside Jupyter/IPython https://github.com/pydantic/pydantic/issues/8232.
If you get a similar error like the one below, it likely means your are using pydantic<2
and I would say this is not super likely to get fixed in pydantic (see https://docs.pydantic.dev/latest/version-policy/#pydantic-v1):
_pickle.PicklingError: Can't pickle <cyfunction int_validator at 0x7f5cb91e01e0>: it's not the same object as pydantic.validators.int_validator
In this case, the simplest work-around seems to define your pydantic model in a separate file as noted in https://github.com/cloudpipe/cloudpickle/issues/408#issuecomment-933760919
Can someone remind me of what it means if this is fixed? I think it means Spark can serialize numpy arrays?
Abstract
The following code snipped fails with cloudpickle but works with stock pickle if pydantic is cythonized (either via a platform-specific wheel or by having cython installed when calling
setup.py
):When using the file via main:
The error message is:
Note that the issue does NOT appear when a non-cythonized pydantic version is used.
Also note that the issue does NOT appear when the file is not
__main__
, for example:Environment
Technical Background
In contrast to pickle, cloudpickle pickles the actual class when it resides in
__main__
, see the following note in the README:I THINK that might be the reason why this happens. What's somewhat weird is that the object in question is
pydantic.validators.int_validator
which CAN actually be pickled:References
This was first reported in #403 here.