RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
939 stars 372 forks source link

How to load faster from redis to spark? #303

Open f771216203 opened 3 years ago

f771216203 commented 3 years ago

I load a txt file with 23645053 rows into pyspark and save it to redis, but I found that it cost about three minutes to load and show, and there is also very slow when I want to search query in values. Do you have any suggestion to load and search?

Here is my code:

df = SparkSession(sc).read.csv('/media/yian/666/spark_data/data_end_35.txt',sep='\t') df = df['_c2', '_c6'] df = df.withColumnRenamed("_c2", "key").withColumnRenamed("_c6", "value") df.write.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").mode('append').save()

df = spark.read.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").load() df.show() test = df.filter(df.value.contains('中華民國')) test.count()