RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
935 stars 368 forks source link

Pyspark error loading data into Redis - java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps; #295

Open jeames00 opened 3 years ago

jeames00 commented 3 years ago

I'm trying to follow the spark-redis tutorial for Python but keep encountering an error when writing the dataframe:

>>> data.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "en_curid").save()

Traceback (most recent call last): File "\<stdin>", line 1, in File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 828, in save self._jwrite.save() File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o50.save. : java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

Pyspark is launched with the command: pyspark --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --jars target/spark-redis_2.11-2.6.0-SNAPSHOT-jar-with-dependencies.jar

$ spark-submit --version
21/02/21 23:29:20 WARN Utils: Your hostname, pop-os resolves to a loopback address: 127.0.1.1; using 192.168.1.119 instead (on interface ens18)
21/02/21 23:29:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.2
      /_/

Using Scala version 2.12.10, Java HotSpot(TM) Server VM, 1.8.0_281

I have a redis server running in docker on localhost:6379, no problems connecting to it from redis-cli.

_Pop!_OS 20 20.10 Python version 3.8.6 Pyspark version 3.0.2 Pipenv version 11.9.0 Java version "1.8.0_281" Java(TM) SE Runtime Environment (build 1.8.0281-b09) Java HotSpot(TM) Server VM (build 25.281-b09, mixed mode)

Full error below:

>>> full_df = spark.read.csv("pantheon.tsv", sep="\t", quote="", header=True, inferSchema=True) 21/02/21 22:21:14 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes >>> data = full_df.select("en_curid", "countryCode", "occupation") >>> data.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "en_curid").save() Traceback (most recent call last): File "\<stdin>", line 1, in File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 828, in save self._jwrite.save() File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o50.save. : java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps; at com.redislabs.provider.redis.RedisConfig.clusterEnabled(RedisConfig.scala:198) at com.redislabs.provider.redis.RedisConfig.getNodes(RedisConfig.scala:320) at com.redislabs.provider.redis.RedisConfig.getHosts(RedisConfig.scala:236) at com.redislabs.provider.redis.RedisConfig.(RedisConfig.scala:135) at org.apache.spark.sql.redis.RedisSourceRelation.(RedisSourceRelation.scala:34) at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:21) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)