ClickHouse / clickhouse-java

ClickHouse Java Clients & JDBC Driver
https://clickhouse.com
Apache License 2.0
1.43k stars 526 forks source link

Spark JDBC cannot save MAP type #1452

Open tbbream opened 11 months ago

tbbream commented 11 months ago

Describe the bug

From spark using the com.clickhouse.jdbc.driver, cannot save a dataframe that has a map type despite the table having a map and can perform this operation in other engines/code bases.

The following error gets raised:

Caused by: java.lang.IllegalArgumentException: Can't get JDBC type for map<string,string>

Steps to reproduce

Spin up clickhouse in a container. Run included python pyspark script.

Expected behaviour

Being able to save a Map type from spark using the jdbc driver

Code example

from pyspark.sql import SparkSession
from pyspark.sql.types import MapType, StringType

jars = ["com.clickhouse:clickhouse-jdbc:0.4.5",]

spark = SparkSession.builder.appName("map-test").config("spark.streaming.stopGracefullyOnShutdown", True).config("spark.jars.packages", ",".join(jars)).config("spark.sql.suffle.partitions", 4).master("local[*]").getOrCreate()

df = spark.createDataFrame([{"key": "key", "map": {"map": "test"}}])

df.write.format("jdbc").mode("append").option("driver", "com.clickhouse.jdbc.ClickHouseDriver").option("url", "jdbc:clickhouse://clickhouse-local:8123").option("dbtable", "test").option("batchsize", 1).option("isolationLevel", "NONE").save()

Error log

Configuration

Environment

ClickHouse server

dolfinus commented 10 months ago

This is because there is no Spark-specific dialect implementation for Clickhouse, so Spark does not know how to convert this type to ClickhouseJDBC-compatible one: https://github.com/apache/spark/blob/b41ea9162f4c8fbc4d04d28d6ab5cc0342b88cb0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L139-L167