crflynn / pbspark

protobuf pyspark conversion
MIT License
21 stars 5 forks source link

PickError Can't pickle <class 'google.protobuf.pyext._message.CMessage' #38

Closed noomanee closed 2 years ago

noomanee commented 2 years ago

I tried to follow your guidelines and got this error. BTW. I am implementing this in Spark 2.4.4. I am not sure does it work with older version?

example = SimpleMessage(name="hello", quantity=5, measure=12.3)
data = [{"value": str(example.SerializeToString())}]
df_encoded = spark.createDataFrame(data)
df_encoded.show()

df_decoded = df_encoded.select(from_protobuf(df_encoded.value, SimpleMessage).alias("value"))
df_expanded = df_decoded.select("value.*")
df_expanded.show()

Here is the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/spark-1cec60ed-2727-4e0f-8dfe-a6eeea9c5884/userFiles-988d1f26-e420-4d3f-bd94-89f483d644a9/_proto.py", line 432, in from_protobuf
    return mc.from_protobuf(data=data, message_type=message_type, options=options)
  File "/tmp/spark-1cec60ed-2727-4e0f-8dfe-a6eeea9c5884/userFiles-988d1f26-e420-4d3f-bd94-89f483d644a9/_proto.py", line 335, in from_protobuf
    return protobuf_decoder_udf(column)
  File "/opt/spark/python/pyspark/sql/udf.py", line 189, in wrapper
    return self(*args)
  File "/opt/spark/python/pyspark/sql/udf.py", line 167, in __call__
    judf = self._judf
  File "/opt/spark/python/pyspark/sql/udf.py", line 151, in _judf
    self._judf_placeholder = self._create_judf()
  File "/opt/spark/python/pyspark/sql/udf.py", line 160, in _create_judf
    wrapped_func = _wrap_function(sc, self.func, self.returnType)
  File "/opt/spark/python/pyspark/sql/udf.py", line 35, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/opt/spark/python/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/opt/spark/python/pyspark/serializers.py", line 600, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can't pickle getset_descriptor objects
crflynn commented 2 years ago

Probably duplicate of https://github.com/crflynn/pbspark/issues/26 and https://github.com/crflynn/pbspark/issues/37. Refer to https://github.com/crflynn/pbspark/issues/26#issuecomment-1153429602. Feel free to re-open if this does not solve your issue.