dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.03k stars 315 forks source link

[FEATURE REQUEST]: Support StructArray as a return type in ArrowFunctions.VectorUdf #826

Open imback82 opened 3 years ago

imback82 commented 3 years ago

We need to support StructArray as a return type in ArrowFunctions.VectorUdf.

To repro:

Func<Microsoft.Spark.Sql.Column, Microsoft.Spark.Sql.Column, Microsoft.Spark.Sql.Column> udf
  = ArrowFunctions.VectorUdf((Func<BinaryArray, Int32Array, StructArray>) someUDFWrapper);

You will get:

Unhandled Exception: System.ArgumentException: Apache.Arrow.StructArray is not supported.
   at Microsoft.Spark.Utils.UdfUtils.GetReturnType(Type type)
   at Microsoft.Spark.Sql.Functions.CreateUdf[TResult](String name, Delegate execute, PythonEvalType evalType)
   at Microsoft.Spark.Sql.ArrowFunctions.VectorUdf[T1,T2,TResult](Func`3 udf)
dbeavon commented 3 years ago

Is this a short-term goal (ie. is it included in "1:1 API compatibility for Dataframes")

imback82 commented 3 years ago

Since we have a workaround using non-arrow UDF, this is not a short-term goal, but let me talk to the team.

cc @suhsteve @rapoth

quarkonium3 commented 3 years ago

I was trying out ArrowFunctions - discovered I'd need this in my application