Open cartermp opened 5 years ago
Thanks for tagging me :-) Definitely, having a mapping from a C# API that looks to already be pipelined and somewhat stateless to an F# one would be nice, although I imagine most of this would simply be mapping an extension method to a partially applied function, pushing the first argument to the first and a rename of the function.
Weighing up the cost / benefit of that, I'm not convinced it's worth embarking on that immediately (compared to something like ML .NET which IMHO needs much more work to be considered F#-friendly).
Other ideas / points might include:
let
bound functions as Spark UDFs - ideally the ability to capture one akin to how MBrace manages it.spark { }
and dataFrame
computation expression with custom keywords for operations? Perhaps even a udf { }
one?The Value Adds and Data Exploration ones could really start to show some of the benefits of working with F# and Spark - things like compile time safety over data sets from samples with intellisense, use of FSI and the REPL etc. could be big wins on the .NET side.
@dsyme, @7sharp9 The most unique value proposition would be F# metaprogramming (staging) that could allow us to implement similar functionality for F# (FlareData/TensorFlare implemented in Scala LMS). Code specialization (~collapsing abstractions) could provide orders of magnitude performance improvement.
Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data
"We present Flare, an accelerator module for Spark that delivers order of magnitude speedups on scale-up architectures for a large class of applications. Inspired by query compilation techniques from main-memory database systems, Flare incorporates a code generation strategy designed to match the unique aspects of Spark and the characteristics of scale-up architectures, in particular processing data directly from optimized file formats and combining SQL-style relational processing with external frameworks such as TensorFlow."
https://www.usenix.org/conference/osdi18/presentation/essertel https://github.com/Microsoft/visualfsharp/pull/3662#issuecomment-333332298
Relevant language suggestion on improved staging of quotations is here: https://github.com/fsharp/fslang-suggestions/issues/584
I think it's a wonderful idea, and given Spark .NET's existence I can see it being given higher priority than it was given in the past.
This user experience item describes idiomatic APIs for C# and F#: https://github.com/dotnet/spark/blob/master/ROADMAP.md#user-experience-1
I think this would be a good issue to discuss what idiomatic looks like for F# in the context of spark.
Here's the (basic) sample from the .NET homepage:
Although this certainly isn't bad, a more idiomatic API could look something like this:
The above is just a starting point for a conversation. It would assume a module of combinators for data frames (and potentially other collection-like structures). Although this wouldn't be difficult to implement or maintain - it would be proportional to maintaining the one-liners in the C# LINQ-style implementation - I wonder what else could be done to make it feel more natural for F#, and what the best bang for our buck here is.
In other words, I'd love to solicit feedback on the kinds of things that matter most to F# developers interested in using Spark, so that it's possible to stack these up relative to their implementation and maintenance costs.
Also including @isaacabraham, as he tends to be a lot more creative than I am when it comes to these things 😄