Open dolfinus opened 9 months ago
No specific benchmark as Spark and ClickHouse usually run in large clusters.
there are some generic perf tunes guide mentioned in https://github.com/housepower/spark-clickhouse-connector/issues/265#issuecomment-1929474900
I've already set repartitionByPartition=false
to avoid repartition on the side of connector. In Spark UI all executors (40 in my case) got the same number of rows, so there was no data skew. Both JDBC and Housepower connectors got the same dataframe with the same distribution and number of partitions.
Hi.
Do you have any benchmarks for reading & writing data using Spark Housepower connector vs others, like official JDBC driver?
Spark ClickHouse Connector is a high performance connector
but for me it is actually slower than JDBC. For example, writing 32Gb of data (3 columns, 2 billion rows):Packages I've used:
Config: