Open koperagen opened 1 week ago
Generated sources will be updated after merging this PR. Please inspect the changes in here.
Unfold itself, by definition, already replaces a column with a new column
What definition?
Originally i wanted a add replace by
function with AddDsl. Then i remembered that toDataFrame DSL is pretty similar to AddDsl and we already have unfold
function, it just lacks an overload with a DSL. Such an overload must be a multiplex operation, right?
unfold by
is interesting, but it will have exactly one operation probably forever. Sounds good, right, but its semantics is no different from replace.
I'd rather have one entry point (replace) for situations when you want to, like, replace a column with a different one.
What definition? Originally i wanted a add
replace by
function with AddDsl. Then i remembered that toDataFrame DSL is pretty similar to AddDsl and we already haveunfold
function, it just lacks an overload with a DSL. Such an overload must be a multiplex operation, right?unfold by
is interesting, but it will have exactly one operation probably forever. Sounds good, right, but its semantics is no different from replace. I'd rather have one entry point (replace) for situations when you want to, like, replace a column with a different one.
I meant the definition of "unfolding", you're unfolding a column with its contents, so its type is bound to change, aka, the column is replaced.
I can see where you're coming from, but it still may be hard for users to have two different ways to unfold, namely .replace {}.unfold {}
and .unfold()
and both work a little bit different and have different arguments.
So I'd either:
df.replace {}.byUnfolding {}
(or a name like that) the only version of unfold and deprecate the old onedf.unfold {}
more powerfulwdyt?
I'd say df.unfold
should stay, because use case for simply unfolding a column with objects stays. Worth to add df.unfold(maxDepth = , roots = )
overload too, missed it.
So I'd either: or make just df.unfold {} more powerful
Please write this API with needed overloads and a few examples of its usage then
I'd say
df.unfold
should stay, because use case for simply unfolding a column with objects stays. Worth to adddf.unfold(maxDepth = , roots = )
overload too, missed it.So I'd either: or make just df.unfold {} more powerful
Please write this API with needed overloads and a few examples of its usage then
I made a little sample of unfold {}.by {}
with the same options as your version. Both df.unfold {}
and df.unfold {}.by {}
can be used. See the commit here for more details:
https://github.com/Kotlin/dataframe/commit/20cab7a12968babcb7f9d8ce8a24e3ca0917eda1
UnfoldingDataFrame looks good, we can use it. Supporting it on the plugin side will require some changes, but it's ok. I'm only worried with return type being different than DataFrame people will be tempted to call df.unfold { col }.by()
. It's first time intermediate object in a multiplex operation is also a DataFrame
. Or the opposite problem: df.unfold { col }.
will print all DataFrame API, with by
being somewhere in completion list no different than let's say filter
But if we go this route, i'd also add
df.replace {}.by(CreateDataFrameDsl)
(only this, without vararg props: KProperty<*> and maxDepth: Int = 0
overloads)
@koperagen Yes, I couldn't find another way to have a notation with 2 selectors and the second one being optional while keeping the DataFrame DSL style :/ but indeed, it's something new. A bit like .recursively()
.
Actually, since by
is defined on UnfoldingDataFrame
it should appear quite high on the list:
We could also put it inside the class to make it even more discoverable.
Alternatively, we could change the return-type of unfold
to something non-dataframe-ish and let people call
df.unfold { a }.byReplacing()
or something.
Imagine that XD df.replace { a }.byUnfolding()
and df.unfold { a }.byReplacing()
and it would do the same.
replace {}.by {}
also looks interesting :)
It covers two interesting use cases:
add
for thatOn the compiler plugin side i will continue to support other overloads later in different PR