RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
935 stars 368 forks source link

Upload Byte Array or non UTF-8 encoding strings to Key Value pairs #320

Open tbreamgh opened 2 years ago

tbreamgh commented 2 years ago

Hi,

I have an application that writes from spark, but reads the content from python. I am able to control the encoding of the strings while reading/writing the content in python. However, I cannot see a way to either upload the raw byte array or a string in a different encoding that does not change the format.

I am uploading content in ISO_8859_1 and the UTF-8 encoding changes the string representation. Is there a work-around or a way to change the encoding, or upload the byte array.

Or is there a way to add support similar #304 but for the toRedisKV function

jeremysong commented 11 months ago

We have similar use case. In our case, we only need to upload RDD[(Array[Byte], Array[Byte])] to Redis.

One solution is to add a new method in redisFunctions:

  def toRedisByteKV(kvs: RDD[(Array[Byte], Array[Byte])], ttl: Int = 0)
                   (implicit
                    redisConfig: RedisConfig = RedisConfig.fromSparkConf(sc.getConf),
                    readWriteConfig: ReadWriteConfig = ReadWriteConfig.fromSparkConf(sc.getConf)) {
    kvs.foreachPartition(partition => setByteKVs(partition, ttl, redisConfig, readWriteConfig))
  }