BatchUpdateException during inserts with jdbc driver

stasDomb commented 1 year ago

Hello. I have an issue with connection to Clickhouse with the driver (version 0.4.6). The problems with Inserts. There is an exception:

java.sql.BatchUpdateException: Unexpected end of file from server, server ClickHouseNode [uri=http://clickhouse.dwh.staging.com:8123/default]@-971229636 at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107) at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:154) at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeLargeBatch(AbstractPreparedStatement.java:85)

I've tried change some timeouts on ClickHouse side, I've tested with several versions of driver. Still have this kind of problems. Inserts 10k-100k ch version - 23.3.8 driver - 0.4.6 It's not a blocker for me, because with retries the data is inserted but have at least 10 exceptions per hour. Could you please give me some advice?

zhicwu commented 1 year ago

Hi @stasDomb, thanks for reporting the issue. Is there any clue on server side? Are you using proxy between the client and server?

stasDomb commented 1 year ago

Hi @stasDomb, thanks for reporting the issue. Is there any clue on server side? Are you using proxy between the client and server?

Hello, thank you for the answer. I don't see anything interesting on Server side, no errors etc. We are using ClickHouse operator and load balancer there

stasDomb commented 1 year ago

Hello, after updating to the version 5.0.0 the previous error was vanished, but a new one was appeared: java.sql.BatchUpdateException: The target server failed to respond, server ClickHouseNode [uri=http://hostname:8123/default]@1702260979 at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107) at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:154) at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeLargeBatch(AbstractPreparedStatement.java:85) at com.clickhouse.jdbc.internal.ClickHouseStatementImpl.executeBatch(ClickHouseStatementImpl.java:752)

eremeykin commented 11 months ago

Hello! Do you have any updates on this issue? I faced with the similar exception:

Caused by: java.sql.BatchUpdateException: The target server failed to respond, server ClickHouseNode [uri=http://clickhouse:8123/test]@-628742641
    at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
    at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:219)
    at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:246)
    at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
    at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
    at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:722)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:648)
    ... 207 more

client: com.clickhouse:clickhouse-jdbc:0.5.0 + HikariCP 5.0.1 + Spring Boot 3.1.4 server: 23.1.7.30

I tried setting connectionTestQuery & validationTimeout (as it was mentioned here https://github.com/ClickHouse/clickhouse-java/issues/290), but it did not work for me. Apparently, the exception originates from the HTTP client.

I also took a look at HTTP tracing & TCP dumps. There is an "end of stream" record in the HTTP trace right after the request with SQL:

2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> <<<SQL body here>>>
2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> "0[\r][\n]"
2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> "[\r][\n]"
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 << "end of stream"
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.i.DefaultManagedHttpClientConnection - http-outgoing-503 Close connection
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.c.InternalHttpClient - ep-0000002676 endpoint closed
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.c.InternalHttpClient - ep-0000002676 discarding endpoint
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.i.PoolingHttpClientConnectionManager - ep-0000002676 releasing endpoint

I didn't find any records on the ClickHouse server side corresponding to the request Id or any other clues.

The TCP dumps look like the server initiates the connection closing (after 3sec timeout), but the client still tries using it (I am not sure here)

I tried sending the same request that the driver/HTTP client sends from the same machine, but using curl instead. And everything worked fine, I got 200 Ok with the requested data. But it does not work when I am using jdbc driver from the application.

Finally, I found this PR https://github.com/ClickHouse/clickhouse-java/pull/760 which seems to address the relevant issue. This PR introduces the new setting validateAfterInactivityMillis, but I can't find the corresponding one in the latest version of the driver (0.5.0). The mentioned PR was made for the older version which has ru.yandex.* packages naming. @zhicwu Could you help me, please? Is there any setting in 0.5.0 version that works the same way? Or any other solution for this issue?

stasDomb commented 11 months ago

Hello! Do you have any updates on this issue? I faced with the similar exception:
Caused by: java.sql.BatchUpdateException: The target server failed to respond, server ClickHouseNode [uri=http://clickhouse:8123/test]@-628742641
  at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
  at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:219)
  at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:246)
  at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
  at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
  at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:722)
  at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:648)
  ... 207 more
client: com.clickhouse:clickhouse-jdbc:0.5.0 + HikariCP 5.0.1 + Spring Boot 3.1.4 server: 23.1.7.30

I tried setting connectionTestQuery & validationTimeout (as it was mentioned here #290), but it did not work for me. Apparently, the exception originates from the HTTP client.

I also took a look at HTTP tracing & TCP dumps. There is an "end of stream" record in the HTTP trace right after the request with SQL:
2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> <<<SQL body here>>>
2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> "0[\r][\n]"
2023-12-07 13:28:57,788 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 >> "[\r][\n]"
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG org.apache.hc.client5.http.wire - http-outgoing-503 << "end of stream"
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.i.DefaultManagedHttpClientConnection - http-outgoing-503 Close connection
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.c.InternalHttpClient - ep-0000002676 endpoint closed
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.c.InternalHttpClient - ep-0000002676 discarding endpoint
2023-12-07 13:28:57,789 [qtp1835442760-5333] DEBUG o.a.h.c.h.i.i.PoolingHttpClientConnectionManager - ep-0000002676 releasing endpoint
I didn't find any records on the ClickHouse server side corresponding to the request Id or any other clues.

The TCP dumps look like the server initiates the connection closing (after 3sec timeout), but the client still tries using it (I am not sure here)

I tried sending the same request that the driver/HTTP client sends from the same machine, but using curl instead. And everything worked fine, I got 200 Ok with the requested data. But it does not work when I am using jdbc driver from the application.

Finally, I found this PR #760 which seems to address the relevant issue. This PR introduces the new setting validateAfterInactivityMillis, but I can't find the corresponding one in the latest version of the driver (0.5.0). The mentioned PR was made for the older version which has ru.yandex.* packages naming. @zhicwu Could you help me, please? Is there any setting in 0.5.0 version that works the same way? Or any other solution for this issue?

Unfortunately, I still have this kind of problem

janson653 commented 10 months ago

Hello guys, I meet the same issue, any update?

bairamovazat commented 9 months ago

Hi!

I updated the library version to 0.6.0 and everything worked. Maybe it will help someone else too.

akenO8 commented 8 months ago

My use case is Trino(435) connecting to Clickhouse(21.11.10.1). I upgraded to 0.6.0 but still had this problem:

Caused by: java.sql.BatchUpdateException: The target server failed to respond, server ClickHouseNode [uri=http://haproxy:8123/default, options={socket_timeout=120000,externalDatabase=false,server_time_zone=Asia/Shanghai,server_version=21.11.10.1,use_binary_string=true}]@1936783272
    at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
    at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:154)
    at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeLargeBatch(AbstractPreparedStatement.java:85)
    at com.clickhouse.jdbc.internal.ClickHouseStatementImpl.executeBatch(ClickHouseStatementImpl.java:752)
    at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryStatement.wrapCall(OpenTelemetryStatement.java:294)
    at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryStatement.executeBatch(OpenTelemetryStatement.java:106)
    at io.trino.plugin.jdbc.JdbcPageSink.appendPage(JdbcPageSink.java:153)
    ... 13 more

den-crane commented 8 months ago

The target server failed to respond, server ClickHouseNode

Client keepalive.timeout=3 / Server keepalive.timeout=3

1 Client successfully executes some query. 2 Client http connection is idle for 2.999999 seconds. 3 Client checks "Am i ildle less than 3 seconds?" - I am. 2.999999<3.0 4 Server closes http because 3 seconds pass. 5 Client sends a new request to Clickhouse server. 6 Server responses with a disconnect.

Server keepalive.timeout should be significantly bigger than Client keepalive.timeout. Webbrowsers have keepalive = 30, webservers have keepalive = 60

akenO8 commented 7 months ago

Server keep_alive_timeout = 3 However, adjusting the keep_alive_timeout parameter for a production cluster is a difficult task. Is there any good idea?

JeasGo commented 6 months ago

Hi , I meet the same issue, I have updated to version 0.6.0 but it doesn't work, please tell me how to solve it

vera-dobryanskaya commented 5 months ago

I also see the com.clickhouse.client.ClickHouseException: HTTP request failed: Broken pipe, server ClickHouseNode. with CH v.23.8.2.7 and java drivers v 0.5.0, and 0.6.0p4.

We use HTTP async writes, with async_insert and wait_for_async_insert set to 1. keep_alive in config.xml is default/3s and there is nothing fancy in how the mutation is set. Neither there is anything interesting in clickhouse-server.log or clickhouse-server.err.log

This error seems to be more likely to happen when the writes/load is spiking. The problem is that it happens about once or twice a day and I'm not sure how to catch the issue at the correct time to setup the packet capture. Leaving packet capture for 24 hours seems not feasible.

For the records, I tried variety of driver configuration options, perhaps except writing a custom socket.

Please, advise on how to troubleshoot or resolve the issue.

chernser commented 4 months ago

@vera-dobryanskaya @JeasGo Thank you for reporting it! I suspect that issue in how streams and stale connections are handled inside CH client. I will investigate the issue. If you have any code examples or log you may share - it would help.

chernser commented 4 months ago

@vera-dobryanskaya are do you use Java Client or JDBC driver?

0xkavouri commented 1 month ago

Hello @chernser , any update on this issue? Facing the same thing with latest clickhouse-jdbc version. Im using spark and it seems to happen when there are more active connections trying to write in parallel to clickhouse

chernser commented 1 month ago

Good day, @0xkavouri! Currently we do not have enough information about the issue. Do you work with ClickHouse Cloud or do you have on-premises installation? What is the CPU/memory usage on Spark side depending on active connections? What is the client configuration? How many max connection?

Thanks!

0xkavouri commented 1 month ago

@chernser Hey thank you for the quick reply. Let me try to gather as much info as possible

on prem clickhouse, v23.3.8.22.altinitystable
cpu/memory usage on spark executors is minimal (if thats what you were asking for)
cpu/memory on clickhouse hosts is minimal, however insert query latency is high. Seems related to how often these errors occur?
i notice that the higher the batch_size when doing inserts, as well as higher number of active connections, leads to more frequent connection errors
i also have not encountered this error when writing smaller datasets (<1GB) using only one active connection, so to me it seems to be related to the insert latency spikes that happen with a larger dataset
max connections is 12 with this particular spark job.

Full stack trace:

java.sql.BatchUpdateException: Error writing request body to server, server ClickHouseNode [uri=http://09.clickhouse:8123/default]@-952645051 at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:109) at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:142) at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeLargeBatch(AbstractPreparedStatement.java:85) at com.clickhouse.jdbc.internal.ClickHouseStatementImpl.executeBatch(ClickHouseStatementImpl.java:568) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:708) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:868) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:867) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

0xkavouri commented 1 month ago

@chernser i believe i figured out the cause in my case. Looks like it's related to zookeeper connection issues when trying to replicate inserts.

SteZe85 commented 1 day ago

Any update on this issue? We're also receiving the error when reading data from Clickhouse through a Java Client application

BatchUpdateException: The target server failed to respond

ClickHouse / clickhouse-java

BatchUpdateException during inserts with jdbc driver #1444