NVIDIA / spark-xgboost-examples

XGBoost GPU accelerated on Spark example applications
Apache License 2.0
52 stars 22 forks source link

IllegalArgumentExce #43

Open tristers-at-square opened 3 years ago

tristers-at-square commented 3 years ago

Describe the bug When trying to train an XGBoost classifier with GPU's, it produces the following error:

IllegalArgumentException: features does not exist

Steps/Code to reproduce bug Calling the fit method as follows:

val xgbClassifier = new XGBoostClassifier(paramMap)
  .setLabelCol(labelName)
  .setFeaturesCols(featureCols)
xgbClassifier.fit(trainDF)

Expected behavior I expected the model to successfully train when running on GPU's.

Environment details (please complete the following information)

Running Spark job on GCP Dataproc with Nvidia Tesla T4 GPU.

The following JAR's are in the /usr/lib/spark/jars/ classPath:

Using the following DataProc initializers to install GPU Drivers and Rapids Accelerators:

Using the following Spark parameter configurations: "spark.executor.resource.gpu.amount": "1" "spark.task.resource.gpu.amount": "1" "spark.rapids.sql.explain": "ALL" "spark.rapids.sql.concurrentGpuTasks": "2" "spark.rapids.memory.pinnedPool.size": "2G" "spark.executor.extraJavaOptions": "-Dai.rapids.cudf.prefer-pinned=true" "spark.locality.wait": "0s" "spark.plugins": "com.nvidia.spark.SQLPlugin" "spark.rapids.sql.hasNans": "false" "spark.rapids.sql.batchSizeBytes": "512M" "spark.rapids.sql.reader.batchSizeBytes": "768M" "spark.rapids.sql.variableFloatAgg.enabled": "true" "spark.rapids.sql.decimalType.enabled": "true" "spark.rapids.memory.gpu.pooling.enabled": "false" "spark.executor.resource.gpu.discoveryScript": "/usr/lib/spark/scripts/gpu/getGpusResources.sh"

tristers-at-square commented 3 years ago

This is similar to the error described here:

https://github.com/NVIDIA/spark-xgboost-examples/issues/13

However, none of those steps seemed to fix my issue.

GaryShen2008 commented 3 years ago

@wbo4958 Hi Bobby, can you help with this issue? What condition can cause "Features does not exist"?

@tristers-at-square BTW, we have a new repo https://github.com/NVIDIA/spark-rapids-examples. I see you're running with the latest version( with rapids-4-spark 21.08.0). The steps should be no change. It'll be better that you can file the issue to the new repo when you have any issue for the latest version. Thanks.

tristers-at-square commented 3 years ago

@GaryShen2008 I see, opened the issue in the new repo. Thanks! 🙏

GaryShen2008 commented 3 years ago

@wbo4958 Hi Bobby, can you check https://github.com/NVIDIA/spark-rapids-examples/issues/21?