MrPowers / bebe

Filling in the Spark function gaps across APIs
50 stars 5 forks source link

Functions that are in Spark SQL and not in Scala API to be implemented #16

Open MrPowers opened 3 years ago

MrPowers commented 3 years ago

We can use this issue to create a list of all the functions that are in Spark SQL, but not in the Scala API for whatever reason.

Here's the list that @nvander1 sent me so we can get started out. He already implemented approx_percentile, so we're on our way!

alfonsorr commented 3 years ago

A few things that I'm not sure about, some of these functions are in the spark API, but maybe not in the org.apache.spark.sql.functions object

For example

MrPowers commented 3 years ago

@alfonsorr - good questions.

Feel free to update the list and just add something like "wont add" to the functions that shouldn't get implemented. This list was generated by a script @nvander1 wrote to compare the SQL functions and the Scala functions, so there are probably some that snuck in there that we don't need.

Yea, definitely want to define substring(str: Column, pos: Column, len: Column): Column. I don't like the functions that take regular Scala types as arguments.

Feel free to go ahead and add the "wont add" annotation to any functions you think we don't need.

alfonsorr commented 3 years ago

I've checked all the methods, and indicated the ones that are already implemented, or not useful in spark / python API