MrPowers / bebe

Filling in the Spark function gaps across APIs
50 stars 5 forks source link

suggestion to split the project #17

Closed alfonsorr closed 3 years ago

alfonsorr commented 3 years ago

The idea es to have this project split into three different subprojects

The reason to isolate the functions is to have a dependency only for people that wants the unimplemented columns in spark following the basic spark API and make it easier to create the python interface of #11 . It will be a great addition if we can split from which spark version functionality is provided. For example RegExpExtractAll it's new in spark 3.1.x, we can exclude this in the previous versions, or try to backport it to allow spark 3.0.x or 2.4 to use it if a copy-paste is the only required thing.

The typed column project will have the core functionality to use the typed columns. This will allow to cross-build this project for spark 2.4 (scala 2.11) and test against 3.0.x.

And last, the Typed functions will merge the two previous projects to present the functions previously provided but typed.

This will require mostly sbt rework, and to we can see in the future if it be better to split it in different repositories 🤷 .

MrPowers commented 3 years ago

@alfonsorr - Thanks for opening this issue. I think we should split this into two repos.

Open to names for the new project. Want to jump on a Zoom/Hangout call sometime to meet / do some brainstorming?

alfonsorr commented 3 years ago

Sure, I can contact you on LinkedIn 👨‍💻

MrPowers commented 3 years ago

The typesafe column stuff has been split out to the doric repo.

Looking forward for @alfonsorr and his team to give Scala users a better way to write Spark code!