Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Reading a CSV file using java. #477

Closed earandap closed 6 years ago

earandap commented 6 years ago

Hi, I am new to mist. I am trying to do a JMistFn with the next spark java example. Is it possible? Thanks in advance

  SparkSession spark = SparkSession
                .builder()
                .appName("Testing")
                .getOrCreate();

        Dataset<Row> df = spark.read()
                .option("sep", ",")
                .option("header", true)
                .option("inferSchema", true)
                .csv(args[0]).toDF();

        Row[] take = df.take(10)
dos65 commented 6 years ago

Hi, yes it's definitely possible. If a problem is in obtaining SparkSession, then there is a little workaround for java:

    @Override
    public JHandle<List<Integer>> handle() {
        return withArgs(intArg("num")).onSparkContext((num, sc) -> {
            SparkSession spark = org.apache.spark.SparkSessionUtils.getOrCreate(sc.sc(), false);
            ...
        });
     }

We have a gitter room - feel free to ask questions there