Open harryshi10 opened 2 weeks ago
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
@pan3793, do you think we also need to provide write.settings
?
code change lgtm, would be great if you could provide a test case
@pan3793, do you think we also need to provide
write.settings
?
yes, it could be implemented in another PR.
Additionally, SPARK-36680 (Spark 4.0) provides a more intuitive SQL syntax for this case
SELECT * FROM $t1 WITH (
split-size
= 5)
sorry I'm still a rookie at Scala. but I will try to write a UT for this new feature
@harryshi10 could please sign CLA
@harryshi10 could please sign CLA
done
code change lgtm, would be great if you could provide a test case
Sorry, I can’t provide a unit test, but here’s a test case I ran locally with PySpark.
env - ClickHouse = 24.10.2.80
, Spark = 3.5.0
A SummingMergeTree
with two records sharing the same key shows duplicates when queried without FINAL
, but returns aggregated results when queried with FINAL
.
In Spark, setting final=0
or final=1
in spark.clickhouse.read.settings
controls whether the results are aggregated or not, with final=0 showing non-aggregated results and final=1 providing aggregated results.
I also tested that adding final=0 or 1
in spark.clickhouse.read.settings
has no side effect on other engines, such as MergeTree
.
Summary
allow read with settings.
close #272