Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Unable To Run Scala code with Spark 2 #333

Closed SushantVarshney closed 7 years ago

SushantVarshney commented 7 years ago

Want to run a demo scala code using Spark 2.1.1 version,so I used your Mist Maven dependency in pom as mentioned in your docs:

<dependency>
    <groupId>io.hydrosphere</groupId>
    <artifactId>mist-lib-spark2_2.11</artifactId>
    <version>0.13.0</version>
</dependency>

Below is my code which I am trying to run:

import io.hydrosphere.mist.api.MistJob
import org.apache.spark.sql.SparkSession
import io.hydrosphere.mist.api.SQLSupport
import io.hydrosphere.mist.api.HiveSupport

object MistApp extends MistJob with SQLSupport with HiveSupport {

  def execute(csvFilePath: String, outputPath: String): Map[String, Any] = {
    val sparkSession = SparkSession.builder().appName("DemoCSVRead").getOrCreate()
    val dataFrame = sparkSession.read.option("header", true).csv(csvFilePath)
    Map("result" -> dataFrame.take(10))
  }

But after submitting job through it,I am getting below error:

java.lang.ClassNotFoundException: org.apache.spark.sql.Row
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:677)

Help me to solve this issue.

blvp commented 7 years ago

What mist version you are using not the library one? Could you please provide your router.conf for this job?

And there is other thing that is missing. Mist jobs cannot return DataFrame and datatype different from the list:

  1. Int
  2. java.lang.Double
  3. String
  4. Seq[_]
  5. java.util.ArrayList[_]
  6. Array[_]
  7. Map[String, _]
  8. Boolean
  9. Option[_] This kind of feature will be supported in #322
SushantVarshney commented 7 years ago

I am using mist-0.13.3-2.1.1.tar.gz version and in the router.conf I am giving mapping of my jar like this:

mist-demo-app = {
  path = "/home/hadoop/MistDemo.jar"
  className = "com.yash.mist.demo.MistApp$"
  namespace = "foo"
}
dos65 commented 7 years ago

@SushantVarshney In your example the result type is Map[String, Array[Row]] - current version of mist can't transmit that kind of object, list of supported types was mentioned by @blvp. For example, you can fix that problem by converting Array[Row] to Array[String]

val data: Array[String] = dataFrame.take(10).map(row => row.mkString(","))
Map("result" -> data)
SushantVarshney commented 7 years ago

@dos65 Thanks for your help.It solved the error. @blvp When you are going to rollout #322 as it will be very beneficial for me.

dos65 commented 7 years ago

@SushantVarshney Glad that it helped you. We can't say certain release date, but I hope it will be soon =)