Open lystor opened 1 year ago
Hi @lystor, AFAIK, transaction support in ClickHouse is still experimental, which means it is not yet ready for production use. However, it is encouraged to use this feature to address integrity issues and provide feedback. I haven't encountered any problems related to this feature causing data corruption or loss, but it is recommended to search the ClickHouse main repository for further evaluation before applying it to your PROD environment.
It appears that there are multiple ClickHouse servers running behind the DNS, and the following events occurred:
1) Kafka Connect asked JDBC driver to start a transaction, hence session was created on server 1
2) Kafka Connect passed a query like insert into ch_table values(?,?,?)
to the JDBC driver
3) The JDBC driver attempted to execute a query select * from ch_table where 0
on server 2 (using the same session ID) to determine parameter types(for serializing Java objects in the RowBinary data format)
4) Server 2 reported a session not found error, because the session was created on server 1 and did not exist on server 2
As a workaround, you may want to enable sticky session on your load balancer by checking the session_id
parameter in the request URL. Enabling implicit_transaction may help too if the JDBC driver is used in auto-commit mode. However, this is not the case in Kafka Connect, as auto-commit is disabled.
Adding a ClickHouse dialect to Kafka Connect could be beneficial. However, it's not advisable to enhance the driver to support one transaction per query, regardless of whether auto-commit is enabled or not.
Hi @zhicwu, Thank you for your detailed answer.
It appears that there are multiple ClickHouse servers running behind the DNS
Nope. A single ClickHouse Server and a single ClickHouse Keeper are used without any load balancers.
It seems the problem is caused by the JDBC driver.
Full configuration below.
config.xml:
<clickhouse>
<allow_experimental_transactions>1</allow_experimental_transactions>
<backups>
<allowed_disk>backups</allowed_disk>
<allowed_path>/backups/</allowed_path>
</backups>
<grpc remove="remove"></grpc>
<listen_host remove="remove">::</listen_host>
<logger replace="replace">
<console>1</console>
<level>warning</level>
</logger>
<max_partition_size_to_drop>0</max_partition_size_to_drop>
<max_table_size_to_drop>0</max_table_size_to_drop>
<mysql_port remove="remove"></mysql_port>
<postgresql_port remove="remove"></postgresql_port>
<prometheus>
<asynchronous_metrics>true</asynchronous_metrics>
<endpoint>/metrics</endpoint>
<events>true</events>
<metrics>true</metrics>
<port>9187</port>
<status_info>true</status_info>
</prometheus>
<storage_configuration>
<disks>
<backups>
<type>local</type>
<path>/backups/</path>
</backups>
</disks>
</storage_configuration>
<zookeeper>
<node>
<host>clickhouse-2.censored.local</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>
keeper_config.xml:
<clickhouse>
<listen_host>0.0.0.0</listen_host>
<logger replace="replace">
<console>1</console>
<level>warning</level>
</logger>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<raft_configuration replace="replace">
<server>
<id>1</id>
<hostname>clickhouse-2.censored.local</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>
users.xml:
<clickhouse>
<profiles>
<default>
<allow_introspection_functions>1</allow_introspection_functions>
</default>
</profiles>
<users>
<default>
<access_management>1</access_management>
<named_collection_control>1</named_collection_control>
<show_named_collections>1</show_named_collections>
<show_named_collections_secrets>1</show_named_collections_secrets>
<networks replace="replace">
<ip>127.0.0.1</ip>
</networks>
</default>
</users>
</clickhouse>
initdb.sql:
CREATE USER IF NOT EXISTS censored IDENTIFIED BY 'censored';
GRANT SELECT ON *.* TO censored;
GRANT INSERT ON *.* TO censored;
docker-compose.yml:
version: "3.8"
services:
keeper:
command:
- /usr/bin/clickhouse-keeper
- --config-file=/etc/clickhouse-keeper/keeper_config.xml
image: clickhouse/clickhouse-keeper:23.3.2.37
logging:
options:
max-size: 10m
max-file: "2"
network_mode: host
ports:
- "9181:9181"
- "9234:9234"
restart: unless-stopped
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65000
hard: 65000
user: clickhouse
volumes:
- ./keeper/keeper_config.xml:/etc/clickhouse-keeper/keeper_config.d/censored.xml:ro
- /data/clickhouse/keeper:/var/lib/clickhouse
server:
cap_add:
- IPC_LOCK
- NET_ADMIN
- SYS_NICE
image: clickhouse/clickhouse-server:23.3.2.37
logging:
options:
max-size: 10m
max-file: "2"
network_mode: host
ports:
- "8123:8123"
- "9000:9000"
- "9187:9187"
restart: unless-stopped
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65000
hard: 65000
user: clickhouse
volumes:
- ./server/config.xml:/etc/clickhouse-server/config.d/censored.xml:ro
- ./server/initdb.sql:/docker-entrypoint-initdb.d/initdb.sql:ro
- ./server/users.xml:/etc/clickhouse-server/users.d/censored.xml:ro
- /data/clickhouse/server:/var/lib/clickhouse
- backups:/backups
volumes:
backups:
driver_opts:
type: "nfs"
o: "addr=backup-1.censored.local,intr,nfsvers=4.1,noatime,nodiratime,soft"
device: "backup-1.censored.local:/data/clickhouse"
kafka-connect Dockerfile:
FROM confluentinc/cp-kafka-connect-base:7.4.0
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"
RUN set -ex \
&& confluent-hub install --no-prompt confluentinc/kafka-connect-avro-converter:7.4.0 \
&& confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.7.0 \
&& cd /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib \
&& curl -fsLO https://github.com/ClickHouse/clickhouse-java/releases/download/v0.4.6/clickhouse-jdbc-0.4.6-shaded.jar
kafka-connect StatefulSet:
...
containers:
- env:
- name: CONNECT_BOOTSTRAP_SERVERS
value: kafka-1-dev-ps.censored.com:9092
- name: CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR
value: "1"
- name: CONNECT_CONFIG_STORAGE_TOPIC
value: gone_deploy-xeen_kafka-connect-config
- name: CONNECT_GROUP_ID
value: deploy-xeen
- name: CONNECT_KEY_CONVERTER
value: io.confluent.connect.avro.AvroConverter
- name: CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL
value: http://kafka-1-dev-ps.censored.com:8081
- name: CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR
value: "1"
- name: CONNECT_OFFSET_STORAGE_TOPIC
value: gone_deploy-xeen_kafka-connect-offset
- name: CONNECT_REST_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: CONNECT_REST_PORT
value: "8083"
- name: CONNECT_STATUS_STORAGE_REPLICATION_FACTOR
value: "1"
- name: CONNECT_STATUS_STORAGE_TOPIC
value: gone_deploy-xeen_kafka-connect-status
- name: CONNECT_VALUE_CONVERTER
value: io.confluent.connect.avro.AvroConverter
- name: CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL
value: http://kafka-1-dev-ps.censored.com:8081
- name: KAFKA_HEAP_OPTS
value: -Xms512M -Xmx512M
- name: KAFKA_JMX_OPTS
value: -javaagent:/usr/share/java/cp-base-new/jmx_prometheus_javaagent-0.18.0.jar=9187:/etc/kafka-connect-exporter/config.yml
image: artifactory-1.censored.com/kafka-connect:xeen-4504
name: kafka-connect
...
Thank you, @lystor, for providing additional details about the environment.
The transaction depends on the session to function properly. Therefore, if an insert operation takes longer than the default session timeout, the transaction will be rolled back on the server, and the client will receive a "session not found" error. Have you attempted to increase the values of transaction_timeout
or session_timeout
(in seconds) in the JDBC connection string? Alternatively, you can consider increasing the value of default_session_timeout
in the server configuration.
I will improve the error message to make it less confusing for others.
Therefore, if an insert operation takesis reproduced longer than the default session timeout.
We don't have such long transactions.
The bug has 100% reproduction rate on new idle server without any workload and active users even on insert of single row.
STR: delay between events in kafka longer than max_session_timeout.
The server's configuration is provided below.
delay between events in kafka longer than max_session_timeout.
I see. Actually, the transaction begins when jdbcConnection.setAutoCommit(false) is called. After either a commit() or a rollback() operation, the current transaction is closed, and a new transaction is initiated.
Perhaps it would be preferable to delay initiating the transaction until the first query is executed. In the meantime, as a workaround, you can increase the values of transaction_timeout
or session_timeout
in connection string.
@zhicwu , thank you for the workaround.
Last question: are you going to make some fixes in the JDBC-driver to remove this annoying and useless DB::Exception: Session not found from logs?
Or this is the expected behavior, and users have to accept and live with these errors in the logs on both client and ClickHouse server sides?
The current behavior is as expected, although it can be confusing. Nevertheless, there is room for improvement in terms of usability. It is uncertain whether session and transaction will be decoupled in the future. However, for the Java library, I am contemplating implementing two modifications:
transactionSupport=lazy
) that allows deferring the creation of the transaction/session until the first query is executed. This option will help prevent unintended long transactions when reusing the same connection with auto-commit disabled.Let's keep this issue as open until the necessary changes have been implemented.
Describe the bug
Hello
Kafka Connect with ClickHouse JDBC driver is used for integration between Kafka and ClickHouse. We have a problem with duplicated records in ClickHouse and decided to enable ClickHouse transactions.
But when transactionSupport=true, Kafka Connect throws SESSION_NOT_FOUND error and reconnects to ClickHouse each time if the delay between inserts is longer than the default_session_timeout.
There are no errors if transactionSupport=false.
Is this the expected behavior?
Is it safe to use ClickHouse transactions on production?
Thank you
Steps to reproduce
Kafka Connect JdbcSinkConfig:
Expected behaviour
No errors.
Error log
ClickHouse server:
Kafka Connect:
Configuration
Environment
ClickHouse server