dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.27k stars 8.72k forks source link

AttributeError: 'SparkXGBClassifierModel' object has no attribute '_to_java' #10413

Closed muhammadbinazeem closed 4 months ago

muhammadbinazeem commented 4 months ago

Hi,

I am getting the following exception while serializing the xgboost with mleap

xgboost version: 1.7.5 mleap version: 0.20.0

jars 1 - xgboost4j-spark_2.12-1.7.6.jar 2 - xgboost4j_2.12-1.7.6.jar

spark version: 3.1.1 python version: 3.8

Error:


AttributeError Traceback (most recent call last) Cell In [28], line 1 ----> 1 pipe_model.serializeToBundle('jar:file:./xgboost.zip', pipe_model.transform(train_df.limit(2)))

File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:25, in serializeToBundle(self, path, dataset) 23 def serializeToBundle(self, path, dataset=None): 24 serializer = SimpleSparkSerializer() ---> 25 serializer.serializeToBundle(self, path, dataset=dataset)

File /usr/local/lib/python3.8/dist-packages/mleap/pyspark/spark_support.py:42, in SimpleSparkSerializer.serializeToBundle(self, transformer, path, dataset) 41 def serializeToBundle(self, transformer, path, dataset): ---> 42 self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf)

File /usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py:333, in PipelineModel._to_java(self) 331 java_stages = gateway.new_array(cls, len(self.stages)) 332 for idx, stage in enumerate(self.stages): --> 333 java_stages[idx] = stage._to_java() 335 _java_obj =\ 336 JavaParams._new_java_obj("org.apache.spark.ml.PipelineModel", self.uid, java_stages) 338 return _java_obj

AttributeError: 'SparkXGBClassifierModel' object has no attribute '_to_java'

pipe_model.stages [PipelineModel_ae7575ff1b02, SparkXGBClassifierModel_a8fa8c151e67] pipe_model.stages[0].stages [StringIndexerModel: uid=StringIndexer_c7d4b372914f, handleInvalid=keep, StringIndexerModel: uid=StringIndexer_3f1fab5bf9b0, handleInvalid=keep, VectorAssembler_45f3b667fe6a, VectorIndexerModel: uid=VectorIndexer_29710769d2d5, numFeatures=4, handleInvalid=keep]

trivialfis commented 4 months ago

I'm not familiar with this, but xgboost has a PySpark interface, would you like to try it?

trivialfis commented 4 months ago

Closing since this is not an error from xgboost.