Open cgrand opened 7 years ago
Hi @kovasb,
I would welcome your input on what value could be offered on data(frame|set).
Datasets have a mapPartitions
method so transducers-based approach is possible but it's just scratching the surface.
Eliminating references to Row
(as we did with Tuple2) would be cool.
Records and spec are ways to get schemas but I'm not sure to see how to put everything together.
We would really appreciate any help with the design.
Thanks.
so i have a few ideas
Make it easy to create dataframe from clojure data
Use specs within dataframe operations
Slowly sinking in. spec
wasn't a thing last time I thought about DF. It totally makes sense.
Walking through s/every
is buggy (cf http://dev.clojure.org/jira/browse/CLJ-2035) but otherwise some PoC mapping works well:
=> (s/conform ::datatype (s/form (s/* (s/tuple string? int?))))
#object[org.apache.spark.sql.types.ArrayType 0x78cc0c02 "ArrayType(StructType(StructField(0,StringType,true), StructField(1,LongType,true)),true)"]
See https://gist.github.com/cgrand/dd1c71feb6c4a05194f9bae8ed8b1998 for impl
I love this project.
Any updates on DataFrame support?
I imagine there's a lot of fun to be had translating clojure.spec's <=> dataframe schemas...