Open clojurians-org opened 7 years ago
Isn't the :key
option in keg/by-key
what you are looking for? (keg/by-key some-data :key (juxt :a :b))
it's about the hdfs directory when save file to disk for mapping hive table, not about the data. such as /user/hive/warehouse/model/db/tableA/a=1/b=3/part-0000 /user/hive/warehouse/model/db/tableA/a=2/b=3/part-0000 ....
create external table tableA ( .... ) partitioned by (a string, b string) stored as textfile ;
i wander whether has any plan on dynamic partition write function, as it is very common use case.
on cascalog, i use [templatefields and sink-template] keyword to control the dynamic partition.
currently , i convert the rdd to dataframe(by partitionBy) to complete this function.
the rdd has the saveAsHadoopFile and MultipleTextOutputFormat class, but it need to exended, so it's very inconvenient. http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job