Open VarunWachaspati opened 4 years ago
Hi @VarunWachaspati , did you consider converting your dataframe to a key/value pair RDD and saving to RDD then? https://github.com/RedisLabs/spark-redis/blob/master/doc/rdd.md#strings-1 It will store the RDD as Redis strings.
Yes, I have been converting my DataFrames to RDD and then writing as the RDD based APIs are very flexible for now. Was wondering if having Dataframe based API for the same would be helpful or not. As it would be very straightforward and intuitive to use.
Yep, we might want to introduce it, but for now it's a low priority since one can use the alternative API to achieve the same.
Currently there two ways to write each row of a Dataframe to Redis -
In most of my spark workloads, there only two columns of interest usually. Namely a
unique_row_identifier
and a computedmetric/label
. Storing a redis string with keyunique_row_identifier
and stringified valuemetric/label
is beneficial because of easy query pattern for consumer and lower memory consumption on Redis (as strings are lighter than hashes).So if there was an API similar to the following -
The serialization to string of the key/value is the responsibility of the consumer of the library. We can throw an appropriate exception in case of non-string types being passed as keys/values for this model.
Anyway for non-string types, we already have Binary Persistence Model.
Let me know your thoughts if this is valid and feasible new API request.