Closed WillSmisi closed 2 years ago
I meet the same problem.
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
thanks in advance.
I have a small PySpark program that uses xgboost4j and xgboost4j-spark in order to train a given dataset in a spark dataframe form.
The training and saving is done, but It seems I cannot load the model.
Current libraries versions:
Pyspark 2.4.5
xgboost4j 0.91
xgboost4j-spark 0.91
trainingData, testData = data.randomSplit([0.7,0.3])
vectorAssembler = VectorAssembler()
.setInputCols(numeric_features_new)
.setOutputCol(FEATURES)
scaler = MinMaxScaler(inputCol = FEATURES,
outputCol = FEATURES + '_scaler')
assemblerInputCols = FEATURES + '_scaler'
xgb_params = dict(
eta=0.1,
maxDepth=2,
missing=0.0,
objective="binary:logistic",
numRound=5,
numWorkers=1
)
xgb = (
XGBoostClassifier(**xgb_params)
.setFeaturesCol(assemblerInputCols)
.setLabelCol(LABEL)
)
pipeline = Pipeline(stages=[
vectorAssembler,
scaler,
xgb
])
print "training model"
pipline_model = pipeline.fit(trainingData)
print "saving model to S3"
pipline_model.write().overwrite().save(modelOssDir)
print "saved model to S3"
print "Loading model..."
pipline_model = PipelineModel.load(modelOssDir)
Traceback (most recent call last):
File "xgboost.py", line 95, in <module>
pipline_model = PipelineModel.load(modelOssDir)
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/util.py", line 362, in load
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/pipeline.py", line 242, in load
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/util.py", line 304, in load
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/pipeline.py", line 299, in _from_java
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/wrapper.py", line 227, in _from_java
File "/home/admin/1610603211241401722_0/pyspark.zip/pyspark/ml/wrapper.py", line 221, in __get_class
ImportError: No module named ml.dmlc.xgboost4j.scala.spark
at com.aliyun.odps.cupid.CupidUtil.errMsg2SparkException(CupidUtil.java:50)
at com.aliyun.odps.cupid.CupidUtil.getResult(CupidUtil.java:131)
at com.aliyun.odps.cupid.requestcupid.YarnClientImplUtil.pollAMStatus(YarnClientImplUtil.java:108)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.applicationReportTransform(YarnClientImpl.java:377)
... 12 more
21/01/22 11:39:21 ERROR Client: Application diagnostics message: Failed to contact YARN for application application_1611286494541_745555769.
Exception in thread "main" org.apache.spark.SparkException: Application application_1611286494541_745555769 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1166)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1543)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am searching for a long time on net.But no use. Please help or try to give some ideas how to achieve this.
thanks in advance.
Closing as PySpark is not supported at the moment.
Closing as PySpark is not supported at the moment.
1698 (comment)
Have you resolved this problem? I meet the same one and have done a lot of search while still find nothing to fix it.
“pyspark.sql.utils.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.classification.RandomForestClassificationModel but found class name org.apache.spark.ml.classification.RandomForestClassifier” same problem
Random forest classification model, are you sure this is related to xgboost?
after saving the model and loading getting the following error
IllegalArgumentException: u'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.Pipeline but found class name org.apache.spark.ml.PipelineModel'
can you please help with this . thanks
Tried the following option
Getting the following errror
No module named ml.dmlc.xgboost4j.scala.spark
Originally posted by @anaveenan in https://github.com/dmlc/xgboost/issues/1698#issuecomment-420472146