Open TheNeuralBit opened 2 years ago
Have we tried pickling these types with CloudPickle?
Yes, I added a parameterized test that tries pickling with each library in #22679: https://github.com/apache/beam/blob/c7f64264451af12ff6c7c0ef4bc95fd7ce0f5418/sdks/python/apache_beam/typehints/schemas_test.py#L592-L605
With cloudpickle we get:
_______________________________________________________________________________________________ PickleTest_2.test_generated_class_pickle _______________________________________________________________________________________________
self = <apache_beam.typehints.schemas_test.PickleTest_2 testMethod=test_generated_class_pickle>
def test_generated_class_pickle(self):
schema = schema_pb2.Schema(
id="some-uuid",
fields=[
schema_pb2.Field(
name='name',
type=schema_pb2.FieldType(atomic_type=schema_pb2.STRING),
)
])
user_type = named_tuple_from_schema(schema)
self.assertEqual(
> user_type, self.pickler.loads(self.pickler.dumps(user_type)))
apache_beam/typehints/schemas_test.py:605:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../../.pyenv/versions/beam/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py:73: in dumps
cp.dump(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <cloudpickle.cloudpickle_fast.CloudPickler object at 0x7fc1c273c880>, obj = <class 'apache_beam.typehints.schemas.BeamSchema_some_uuid'>
def dump(self, obj):
try:
> return Pickler.dump(self, obj)
E TypeError: cannot pickle 'google.protobuf.pyext._message.MessageDescriptor' object
../../../../.pyenv/versions/beam/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py:633: TypeError
Can we close this since https://github.com/apache/beam/pull/23739 was merged ?
This is technically still an issue since dill can't pickle the types.
What happened?
The NamedTuple types we generate in
apache_beam.typehints.schemas
confound pickle libraries. We work around this in many places (e.g. GeneratedClassRowTypeConstraint #22679). We should see if we can find a way to make these types picklable, and clean up the workarounds.Making the types work with cloudpickle should be the priority.
Issue Priority
Priority: 2
Issue Component
Component: sdk-py-core