dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.33k stars 8.73k forks source link

XGBoostj4-spark train failed with spark version below than spark-3.5.0 #10920

Closed NvTimLiu closed 3 weeks ago

NvTimLiu commented 1 month ago

XGBoostj4-spark train failed with spark version below than spark-3.5.0, detailed log attached: XGBOOST4J-spark-falures.log

java.lang.NoSuchMethodError: org.apache.spark.ml.util.DatasetUtils$.getNumClasses(Lorg/apache/spark/sql/Dataset;Ljava/lang/String;I)I
  at org.apache.spark.ml.xgboost.SparkUtils$.getNumClasses(SparkUtils.scala:61)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.inferNumClasses$1(XGBoostClassifier.scala:67)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.validateObjective(XGBoostClassifier.scala:85)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.validate(XGBoostClassifier.scala:103)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:410)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train$(XGBoostEstimator.scala:409)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:33)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:33)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:151)
  at $anonfun$xgbClassificationModel$1(<console>:33)
  at Benchmark$.time(<console>:30)

ENVS:

1, OS: ubuntu20.04

2, Spark ver: 3.2.1

3, XGBoost4j-spark: xgboost4j-spark-gpu_2.12-2.2.0-SNAPSHOT.jar

4, rapids-4-spark: 24.12.0-SNAPSHOT

5, failed file: agaricus-gpu.ipynb

6, CLI: jupyter nbconvert --to notebook --stdout --execute notebook-examples/examples/XGBoost-Examples/agaricus/notebooks/scala/agaricus-gpu.ipynb

NvTimLiu commented 1 month ago

@wbo4958

hcho3 commented 1 month ago

The fix is available at https://github.com/dmlc/xgboost/pull/10917

wbo4958 commented 1 month ago

The fix is available at #10917

Yes, true

wbo4958 commented 4 weeks ago

Hi @NvTimLiu, Could you verify it using latest snapshot jars?

NvTimLiu commented 4 weeks ago

sure will verify with latest snapshot jars

NvTimLiu commented 3 weeks ago

Spark versions(3.2.0+/3.3.0+) PASS the tests against the latest snapshot jars