RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
935 stars 368 forks source link

Writing Byte Array to Hash #304

Open MartinsGabrielC opened 3 years ago

MartinsGabrielC commented 3 years ago

I'm currently writing a dataframe/RDD consisting of <Key: String, Data: Array[Byte]>. I have the scenario where we have a hash representing a entity and some of the data is loaded through a spark job and some is loaded by an API. The API also need to read the data that is loaded by the spark job. We use data as byte array to reduce it's size, the data itself is compressed and encoded.

We are using Jedis to read the data and also run some tests doing direct load which worked fine. Below the first result is doing direct load, which is then returning the correct value and the second is using Dataframe load. image

From #205 I saw that byte array Lists was implemented, is it possible to have the same for Hashes? Or is there an workaround for it?

Thanks in advance.

fe2s commented 3 years ago

Hi @MartinsGabrielC , I think it should be possible to add DataFrame support for byte array fields. I will take a look.

fe2s commented 3 years ago

Hi @MartinsGabrielC , Adding dataframe support is not a trivial change. Instead I prototyped a function to write RDD. Would you be able to check it out in branch https://github.com/RedisLabs/spark-redis/tree/issue-304-toRedisByteHASHes ? The function is called toRedisByteHASHes. All arguments are represented as Array[Byte]. If you have String in your case you can convert it with .getBytes into array.

MartinsGabrielC commented 3 years ago

Hi @fe2s,

Thanks for the return, I built it locally and made the apropriate changes and now it's working 😄

MartinsGabrielC commented 3 years ago

Hi @fe2s,

Thanks again for the implementation. Would you know when it will be available in a release?

Best regards.

fe2s commented 3 years ago

@MartinsGabrielC , sorry for the delay, I've created a PR https://github.com/RedisLabs/spark-redis/pull/306 Should be released in a couple of days.

MartinsGabrielC commented 2 years ago

Hey guys, thanks for the release, I was checking it was only added to the Scala 2.12 version, is it possible to add it to the Scala 2.11 as well?