ClickHouse / spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
https://clickhouse.com/docs/en/integrations/apache-spark
Apache License 2.0
188 stars 66 forks source link

Spark 3.5: Support micro timestamp #311

Closed Veiasai closed 6 months ago

Veiasai commented 6 months ago

Fix #310

pan3793 commented 6 months ago

thanks for making this change, do you have a chance to add a ut?

Veiasai commented 6 months ago

310

Veiasai commented 6 months ago

@pan3793 let me take a look.

By the way, how to build the java tar locally?
https://housepower.github.io/spark-clickhouse-connector/developers/01_build_and_test/ this said it should occur in build/, but I don't find it

pan3793 commented 6 months ago

Oh, I forget to update the docs when switching the default Spark version from 3.4 to 3.5, the jar should be at spark-3.5/clickhouse-spark-runtime/build/libs/

➜  Projects cd spark-clickhouse-connector
(scc) ➜  spark-clickhouse-connector git:(master) ./gradlew clean build -x test
Starting a Gradle Daemon (subsequent builds will be faster)

> Task :clickhouse-core:compileScala
[Warn] : two feature warnings; re-run with -feature for details
one warning found

> Task :clickhouse-spark-3.5_2.12:compileScala
[Warn] /Users/chengpan/Projects/spark-clickhouse-connector/spark-3.5/clickhouse-spark/src/main/scala/org/apache/spark/sql/clickhouse/ExprUtils.scala:159:21: non-variable type argument Any in type pattern org.apache.spark.sql.connector.expressions.LiteralValue[Any] is unchecked since it is eliminated by erasure
one warning found

BUILD SUCCESSFUL in 29s
33 actionable tasks: 32 executed, 1 up-to-date
(scc) ➜  spark-clickhouse-connector git:(master) ll spark-3.5/clickhouse-spark-runtime/build/libs/
total 2688
-rw-r--r--  1 chengpan  staff   261B May 10 13:59 clickhouse-spark-runtime-3.5_2.12-0.8.0-SNAPSHOT-empty.jar
-rw-r--r--  1 chengpan  staff   261B May 10 13:59 clickhouse-spark-runtime-3.5_2.12-0.8.0-SNAPSHOT-javadoc.jar
-rw-r--r--  1 chengpan  staff   261B May 10 13:59 clickhouse-spark-runtime-3.5_2.12-0.8.0-SNAPSHOT-sources.jar
-rw-r--r--  1 chengpan  staff   1.3M May 10 13:59 clickhouse-spark-runtime-3.5_2.12-0.8.0-SNAPSHOT.jar
pan3793 commented 6 months ago

you can skip changes for spark-3.4 and spark-3.3, just focus on clickhouse-core and spark-3.5, there is a half-auto script to backport the patch to older spark versions.

Veiasai commented 6 months ago

There is no test suite that I can easily extend.... (and I am not familiar with java/scala.. Probably I'll do a system test with the generated jar..

pan3793 commented 6 months ago

Okay~

pan3793 commented 6 months ago

you can construct a simple case and leave it in the comments, I will find time to add it to UT later

Veiasai commented 6 months ago

Previous

.config(
    "spark.jars.packages",
    "com.github.housepower:clickhouse-spark-runtime-3.4_2.12:0.7.3,com.clickhouse:clickhouse-jdbc:0.4.6"
)

image

With this fix

 .config(
    "spark.jars",
    "/home/ubuntu/spark-clickhouse-connector/spark-3.5/clickhouse-spark-runtime/build/libs/clickhouse-spark-runtime-3.5_2.12-0.8.0-SNAPSHOT.jar"
)

image

pan3793 commented 6 months ago

can try both json and binary for spark.clickhouse.read.format?

Veiasai commented 6 months ago

.config("spark.clickhouse.read.format", "json")

in pyspark?

tested, same. image

yea.. it works. since if I add "x" it failed.

xenon.clickhouse.exception.CHClientException:  [-1] Unsupported read format: x
pan3793 commented 6 months ago

the default value is "json", if "binary" works too, it's good

Veiasai commented 6 months ago

yea I tried both

pan3793 commented 6 months ago

Thanks, merged to master

Veiasai commented 6 months ago

when will we release?