Closed gorkemozkaya closed 6 years ago
This is unlikely to be an XGBoost error since the code works fine in spark-shell. I'd suggest submitting the issue to Apache Toree. It could be that multiple LabeledPoint classes (one in spark-ml and one in xgboost) confuse the kernel.
For bugs or installation issues, please provide the following information. The more information you provide, the more easily we will be able to offer help and advice.
Environment info
Operating System: Linux
Compiler: g++
Steps to reproduce:
val trainRDD = sc.parallelize(Seq( LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(1.0, new DenseVector(Array(2.0, 3.0, 4.0))), LabeledPoint(0.0, new DenseVector(Array(5.0, 5.0, 5.0))) ), 4)
val paramMap = List( "eta" -> 0.1f, "max_depth" -> 2, "objective" -> "binary:logistic").toMap
val xgboostModelRDD = XGBoost.train(trainRDD, paramMap, 1, 4, useExternalMemory=true)