GoEddie / spark-connect-dotnet

MIT License
10 stars 2 forks source link

How to use functions from ML? #13

Open mrericrichter opened 4 months ago

mrericrichter commented 4 months ago

Could you please provide an example (or even an implementation) of how to use functions from the ML package, e.g. the Microsoft.Spark.ML.Feature.Bucketizer class?

GoEddie commented 4 months ago

I was looking at how PySpark implements the ML functions, it seems that they use numpy and do some of the work on the client but I'm not 100% - will carry on trying to figure it out!

mrericrichter commented 4 months ago

Typically, these functions run on worker nodes. However, it seems that Spark Connect currently supports SQL functions only. All supported functions contain a flag in their documentation that states 'Supports Spark Connect', e.g.

image

Functions from the MLLib do not contain this flag, e.g.

image

This seems to be a major limitation of Spark Connect as of today. The documentation says that more functions will be added in future versions of Spark.

GoEddie commented 4 months ago

i'll keep this open and add them in when they are available