Closed Jolanrensen closed 5 days ago
I'm not on the JetBrains team, so take this with a grain of salt, but I don't agree with your assumption that a
and b
should give the same result. Every row in a
is specifically defined to be a Location
(and can reference its schema), b
never states the intended type.
The docs for dataFrameOf
state: Returns DataFrame with given column names and values
. It seems consistent with those docs that b
would define the second column as values (of Gps), and not a ColumnGroup
. Meanwhile toDataFrame
's docs mention perform deep object graph traversal and convert nested objects into ColumnGroups and FrameColumns
, so not surprising to see independent nested cols for latitude
and longitude
.
Even if it had the capability to do so, wouldn't expect the program to search through all of your available DataSchema
s to see which is the best fit... you have to specify the schema somewhere. And now that I say that, it would actually be potentially dangerous to have the program just assume that every pair of String + Gps is automatically a Location, without you specifying that.
Just my 2 cents, I'm still learning more about the dataframe API every day.
I'm not saying it should recognize it as a Location
, but like a { latitude:0.0, longitude:0.0 }
. But I see your point to be able to read it as "value". It's a valid point!
It's interesting, a
is actually represented by 3 columns -- name, gps/latitude and gps/longitude. .toDataFrame()
breaks down Location into its two properties and then it breaks down the Gps type into its two properties and creates nested columns for each. That's actually pretty cool. dataFrameOf()
is simpler, it just creates a single value per column, so in this case that value's type is Gps. But it doesn't perform the extra step of creating nested columns from Gps.
Exactly! That's why I thought it would (or should) behave the same. Unless you specifically want it to be a Value Column
Currently .toDataFrame
allows to configure objects traversal, including depth, but dataFrameOf
doesn't. If dataFrameOf
unfolds all objects marked with @DataShema
into ColumnGroups
by default, there should be an option to fallback to ValueColumns
, because there is no universal way to pack column groups back into objects (although it can be supported for data classes).
So, I agree with suggestion to unpack @DataSchema
objects by default, but let's add some DSL for dataFrameOf
configuration to make it flexible:
@DataSchema
data class A(val value: Int)
@DataSchema
data class B(val c: C)
@DataSchema
data class C(val value: Int)
val b: AnyFrame = dataFrameOf("a","b")(
A(1), B(C(2)),
A(3), null,
) {
depth = 1 // default depth of object traversal: unfold only top-level objects
unfold<C>() // force unfolding of `C`
preserve<A>() // prevent `A` from unfolding
}
// ⌌-------------------------------------------------⌍
// | | a:A | b:{c:{value: Int}} |
// |--|------------|---------------------------------|
// | 0| A(value=1) | { c: {value:2}} |
// | 1| A(value=3) | { c: {value: null}} |
// ⌎-------------------------------------------------⌏
Default depth
will be null
that means that all @DataSchema
objects should be unfolded and all other objects should be preserved.
depth = 0
means that no objects should be unfolded and @DataSchema
annotation should be ignored.
.toDataFrame
and unfold
should have the same behaviour and configuration DSLs
Just my +1 for configuration option.
Unless you specifically want it to be a Value Column
But how can I achieve this if it just unfolds everything by default?
Use case: I have a data class representing a set of parameters for a measurement. So first I want to group results by parameters, but later after some processing I would like to unfold parameter set to distinct columns of parameters. So it make sense to me to annotate my parameter set with DataSchema.
+1 @nikitinas I can always call unfold
on a column, but I can't fold it back if it is unfolded automagically for me.
Now we have DataRowSchema
which allows inheriting classes to be unfolded into dataframes with dataFrameOf
I think we can close this solution.
toDataFrame()
can convert a list of objects to the depth the user defines, which is not possible in dataFrameOf()()
so we cannot assume that's what the user wants.
given
One would expect
a
andb
to give the same result. However, ina
gps
is recognized as aGroup
and in the second as aValue
.