Closed yuehanlyu closed 3 years ago
@yuehanlyu I can not reproduce your issue, it could be an environmental problem. I googled the error message, see here: https://stackoverflow.com/questions/39953245/how-to-fix-java-lang-classcastexception-cannot-assign-instance-of-scala-collect
It seems the error could be caused by missing jars in the Spark worker nodes, please try to add the following related jars:
pmml4s-spark_2.11-0.9.7.jar
pmml4s_2.11-0.9.7.jar
commons-text-1.6.jar
spray-json_2.11-1.3.5.jar
You can get those jars from here: https://github.com/autodeployai/pypmml-spark/tree/master/pypmml_spark/jars
@scorebot, thanks for your help!
I change the environment to Spark 2.4.0, and upload three jars
pmml4s-spark_2.11-0.9.7.jar
pmml4s_2.11-0.9.7.jar
spray-json_2.11-1.3.5.jar
Then the transform function works! p.s. Spark 2.4.4 with those jars didn't work.
Hi! I'm trying to use AWS EMR to score a Dataframe using a pmml model, but got error that is not explicit to trace. Any help would be appredicated.
Read the dataframe:
val df = spark.read.parquet("myDataframe.parquet")
df.show()
works fine.Read the pmml model:
val model = ScoreModel.fromFile("myxgboostModel.pmml")
Then run the following code:
model.transform(df).show()
Gave an error message:
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
Environment: Spark 2.4.4, Zeppelin 0.8.2.
However, I didn't meet this error when I was coding with Intellij on my macbook.