crflynn / pbspark

protobuf pyspark conversion
MIT License
21 stars 5 forks source link

Could not serialize object #37

Closed MRabenda closed 2 years ago

MRabenda commented 2 years ago

Env: Databricks - Spark 3.3.0 pbspark-0.7.0 protobuf 4.21.5

from pbspark import from_protobuf
from pbspark import to_protobuf

example = SimpleMessage(name="hello", quantity=5, measure=12.3)
data = [{"value": example.SerializeToString()}]
df_encoded = spark.createDataFrame(data)

df_decoded = from_protobuf(df_encoded.value, SimpleMessage)
df_expanded = df_decoded.select("value.*")
df_expanded.show()

from_protobuf returns an error

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 527, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'google._upb._message.Descriptor' object
PicklingError: Could not serialize object: TypeError: cannot pickle 'google._upb._message.Descriptor' object
crflynn commented 2 years ago

Probably duplicate of https://github.com/crflynn/pbspark/issues/26. Refer to https://github.com/crflynn/pbspark/issues/26#issuecomment-1153429602. Feel free to re-open if this does not solve your issue.