Open f771216203 opened 3 years ago
I load a txt file with 23645053 rows into pyspark and save it to redis, but I found that it cost about three minutes to load and show, and there is also very slow when I want to search query in values. Do you have any suggestion to load and search?
Here is my code:
df = SparkSession(sc).read.csv('/media/yian/666/spark_data/data_end_35.txt',sep='\t') df = df['_c2', '_c6'] df = df.withColumnRenamed("_c2", "key").withColumnRenamed("_c6", "value") df.write.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").mode('append').save()
df = spark.read.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").load() df.show() test = df.filter(df.value.contains('中華民國')) test.count()
I load a txt file with 23645053 rows into pyspark and save it to redis, but I found that it cost about three minutes to load and show, and there is also very slow when I want to search query in values. Do you have any suggestion to load and search?
Here is my code:
df = SparkSession(sc).read.csv('/media/yian/666/spark_data/data_end_35.txt',sep='\t') df = df['_c2', '_c6'] df = df.withColumnRenamed("_c2", "key").withColumnRenamed("_c6", "value") df.write.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").mode('append').save()
df = spark.read.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").load() df.show() test = df.filter(df.value.contains('中華民國')) test.count()