Closed muhammadbinazeem closed 4 months ago
Hi,
I am getting the following exception while serializing the xgboost with mleap
xgboost version: 1.7.5 mleap version: 0.20.0
jars 1 - xgboost4j-spark_2.12-1.7.6.jar 2 - xgboost4j_2.12-1.7.6.jar
spark version: 3.1.1 python version: 3.8
Error:
AttributeError Traceback (most recent call last) Cell In [28], line 1 ----> 1 pipe_model.serializeToBundle('jar:file:./xgboost.zip', pipe_model.transform(train_df.limit(2)))
File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:25, in serializeToBundle(self, path, dataset) 23 def serializeToBundle(self, path, dataset=None): 24 serializer = SimpleSparkSerializer() ---> 25 serializer.serializeToBundle(self, path, dataset=dataset)
File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:42, in SimpleSparkSerializer.serializeToBundle(self, transformer, path, dataset) 41 def serializeToBundle(self, transformer, path, dataset): ---> 42 self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf)
File /usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py:333, in PipelineModel._to_java(self) 331 java_stages = gateway.new_array(cls, len(self.stages)) 332 for idx, stage in enumerate(self.stages): --> 333 java_stages[idx] = stage._to_java() 335 _java_obj =\ 336 JavaParams._new_java_obj("org.apache.spark.ml.PipelineModel", self.uid, java_stages) 338 return _java_obj
AttributeError: 'SparkXGBClassifierModel' object has no attribute '_to_java'
pipe_model.stages [PipelineModel_ae7575ff1b02, SparkXGBClassifierModel_a8fa8c151e67] pipe_model.stages[0].stages [StringIndexerModel: uid=StringIndexer_c7d4b372914f, handleInvalid=keep, StringIndexerModel: uid=StringIndexer_3f1fab5bf9b0, handleInvalid=keep, VectorAssembler_45f3b667fe6a, VectorIndexerModel: uid=VectorIndexer_29710769d2d5, numFeatures=4, handleInvalid=keep]
I'm not familiar with this, but xgboost has a PySpark interface, would you like to try it?
Closing since this is not an error from xgboost.
Hi,
I am getting the following exception while serializing the xgboost with mleap
xgboost version: 1.7.5 mleap version: 0.20.0
jars 1 - xgboost4j-spark_2.12-1.7.6.jar 2 - xgboost4j_2.12-1.7.6.jar
spark version: 3.1.1 python version: 3.8
Error:
AttributeError Traceback (most recent call last) Cell In [28], line 1 ----> 1 pipe_model.serializeToBundle('jar:file:./xgboost.zip', pipe_model.transform(train_df.limit(2)))
File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:25, in serializeToBundle(self, path, dataset) 23 def serializeToBundle(self, path, dataset=None): 24 serializer = SimpleSparkSerializer() ---> 25 serializer.serializeToBundle(self, path, dataset=dataset)
File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:42, in SimpleSparkSerializer.serializeToBundle(self, transformer, path, dataset) 41 def serializeToBundle(self, transformer, path, dataset): ---> 42 self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf)
File /usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py:333, in PipelineModel._to_java(self) 331 java_stages = gateway.new_array(cls, len(self.stages)) 332 for idx, stage in enumerate(self.stages): --> 333 java_stages[idx] = stage._to_java() 335 _java_obj =\ 336 JavaParams._new_java_obj("org.apache.spark.ml.PipelineModel", self.uid, java_stages) 338 return _java_obj
AttributeError: 'SparkXGBClassifierModel' object has no attribute '_to_java'
pipe_model.stages [PipelineModel_ae7575ff1b02, SparkXGBClassifierModel_a8fa8c151e67] pipe_model.stages[0].stages [StringIndexerModel: uid=StringIndexer_c7d4b372914f, handleInvalid=keep, StringIndexerModel: uid=StringIndexer_3f1fab5bf9b0, handleInvalid=keep, VectorAssembler_45f3b667fe6a, VectorIndexerModel: uid=VectorIndexer_29710769d2d5, numFeatures=4, handleInvalid=keep]