apache / ignite

Apache Ignite
https://ignite.apache.org/
Apache License 2.0
4.82k stars 1.9k forks source link

Reading from Apache Ignite with JDBC driver gives SQLException: Fetch size must be greater than zero #11599

Open Felix-neko opened 1 month ago

Felix-neko commented 1 month ago

Hi folks! And thank you for your project.

A have a problem using Apache Ignite with PySpark and JDBC. I'm trying to read some data from an Apache Ignite table with PySpark.

spark.read.format("jdbc").option("driver", "org.apache.ignite.IgniteJdbcThinDriver")\
.option("url", "jdbc:ignite:thin://172.19.0.1:10800;schema=fs_dev").option("dbtable", "country").load().show()

But it gives me an error:

java.sql.SQLException: Fetch size must be greater than zero.
  at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.setFetchSize(JdbcThinStatement.java:620)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:302)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:123)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

Can I somehow fix it?

P.S. I'm using Python 3.7 and PySpark 2.4.8