Open ekatabavkar opened 11 months ago
Hi @ekatabavkar, sorry for the late reply.
There are no errors in the clickhouse server logs.
If we can confirm that the query has no issue, then it's probably just related to the network. You may want check ClickHouse/clickhouse-docs#1178 and see if it will help.
When does this error occur and how can we prevent to avoid duplicate data ?
See https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/.
thereby resulting in duplicate data. When does this error occur and how can we prevent to avoid duplicate data ?
@ekatabavkar you should disable async_insert to avoid problems with deduplication. see https://clickhouse.com/docs/en/optimize/asynchronous-inserts
AUTOMATIC DEDUPLICATION IS DISABLED BY DEFAULT WHEN USING ASYNCHRONOUS INSERTS Manual batching (see section above)) has the advantage that it supports the built-in automatic deduplication of table data if (exactly) the same insert statement is sent multiple times to ClickHouse Cloud, for example, because of an automatic retry in client software because of some temporary network connection issues.
We are trying to load data into ReplicatedMergeTree table in clickhouse via spark from Google Dataproc Cluster. However, we are getting intermittent Error while inserting data using jdbc driver: 0.46 shaded
Clickhouse version: 23.4.6.25 spark version: 2.4.8
Error shown below(sensitive data is hidden) :
There are no errors in the clickhouse server logs. When this error occurs, spark tries to retry the failed task which eventually succeeds, thereby resulting in duplicate data. When does this error occur and how can we prevent to avoid duplicate data ?