Closed ankit-khare-2015 closed 2 years ago
any update here ..?? i really need this to be working
joemcmahon hope someone can help me understand this issue
apparently below code solves the issue
`import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate()
// For implicit conversions like converting RDDs to DataFrames import spark.implicits._
val rows = spark.sparkContext.parallelize(Seq( RawData("thingA", "13.0", "IN_TRANSIT", "true"), RawData("thingA", "5", "DELAYED", "false"), RawData("thingB", null, "DELAYED", null), RawData("thingC", null, "IN_TRANSIT", "false"), RawData("thingD", "1.0", "DELAYED", "true"), RawData("thingC", "7.0", "UNKNOWN", null), RawData("thingC", "24", "UNKNOWN", null), RawData("thingE", "20", "DELAYED", "false"), RawData("thingA", "13.0", "IN_TRANSIT", "true"), RawData("thingA", "5", "DELAYED", "false"), RawData("thingB", null, "DELAYED", null), RawData("thingC", null, "IN_TRANSIT", "false"), RawData("thingD", "1.0", "DELAYED", "true"), RawData("thingC", "17.0", "UNKNOWN", null), RawData("thingC", "22", "UNKNOWN", null), RawData("thingE", "23", "DELAYED", "false") )) val data = spark.createDataFrame(rows)
data.printSchema
print(data.getClass)
val suggestionResult = {ConstraintSuggestionRunner().onData(data).addConstraintRules(Rules.DEFAULT).run()}
suggestionResult.constraintSuggestions.foreach { case (column, suggestions) => suggestions.foreach { suggestion => println(s"Constraint suggestion for '$column':\t${suggestion.description}\n" + s"The corresponding scala code is ${suggestion.codeForConstraint}\n") } }`
I guess an explicit conversion was needed
i was able to fix this issue by myself , we have to be a bit more proactive in providing support in case some help is needed in dev side do let me know thanks
Spark version _:
`case class RawData( productName: String, totalNumber: String, status: String, valuable: String )
val rows = spark.sparkContext.parallelize(Seq( RawData("thingA", "13.0", "IN_TRANSIT", "true"), RawData("thingA", "5", "DELAYED", "false"), RawData("thingB", null, "DELAYED", null), RawData("thingC", null, "IN_TRANSIT", "false"), RawData("thingD", "1.0", "DELAYED", "true"), RawData("thingC", "7.0", "UNKNOWN", null), RawData("thingC", "24", "UNKNOWN", null), RawData("thingE", "20", "DELAYED", "false"), RawData("thingA", "13.0", "IN_TRANSIT", "true"), RawData("thingA", "5", "DELAYED", "false"), RawData("thingB", null, "DELAYED", null), RawData("thingC", null, "IN_TRANSIT", "false"), RawData("thingD", "1.0", "DELAYED", "true"), RawData("thingC", "17.0", "UNKNOWN", null), RawData("thingC", "22", "UNKNOWN", null), RawData("thingE", "23", "DELAYED", "false") ))
val data = spark.createDataFrame(rows) print(data.getClass) val suggestionResult = ConstraintSuggestionRunner().onData(data) .addConstraintRules(Rules.DEFAULT) .run()`
Error
val suggestionResult = ConstraintSuggestionRunner().onData(sqlDF) .addConstraintRules(Rules.DEFAULT) .run() Name: Unknown Error Message: <console>:56: error: type mismatch; found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] val suggestionResult = ConstraintSuggestionRunner().onData(data) ^