amplab / SparkNet

Distributed Neural Networks for Spark
MIT License
603 stars 172 forks source link

explode UDF returning json array #148

Closed guruvonline closed 4 years ago

guruvonline commented 4 years ago

HI, How can I have udf return multiple rows? I am trying to return string from UDF as json array and then explode/fromjson to get rows but getting error.

I have UDF which returns a string (josn array) like

[ {"id":"1", "name":"name1"}, {"id":"2", "name":"name2"} ]

I want to explode it into 2 rows and then save into table with id and name as columns.

I tried defining schema like

/ schema of each item in the array
StructType arrayItemDataType= new StructType(new structfiled("name"), new structfiled(id));
var rootType = new ArrayType(itemDataType); // array of items

var field = new StructField(name: "MyJson", dataType: rootType, isNullable: false);
StructType schema = new StructType(new structField(field));

after explode it create schema like

root
 |-- col: struct (nullable = true)
 |    |-- name: string
 |    |-- id: string

I am running on local cluster and writing to csv, I was expecting after explode it should have dataframe with 2 columns name and id and i can write all rows in csv. When I run it is not creating df schema as name,id and fails to write to csv with message "csv doesn't support struct<>

guruvonline commented 4 years ago

Sorry wrong repo