Open alexanderursu99 opened 3 years ago
Updated the title, since I tested this using a PostgreSQL sink connector and get the same result, and I now believe this is a general issue with JDBC sinks.
Logs from the PostgreSQL sink connector. Configured with all the same settings as were used with the ClickHouse sink connector, and using the same Kubernetes runtime.
20:46:59.030 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:45:59.029 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:44:59.027 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:43:59.025 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:42:59.024 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:41:59.023 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:40:59.022 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:39:59.020 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
... 12 more
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:313) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2044) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.core.PGStream.receiveChar(PGStream.java:372) ~[postgresql-42.2.12.jar:42.2.12]
Caused by: java.io.EOFException
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_252]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_252]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252]
at org.apache.pulsar.io.jdbc.JdbcAbstractSink.flush(JdbcAbstractSink.java:203) ~[pulsar-io-jdbc-core-2.6.1.jar:?]
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:148) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) ~[postgresql-42.2.12.jar:42.2.12]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:340) ~[postgresql-42.2.12.jar:42.2.12]
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
20:39:56.968 [pool-5-thread-1] ERROR org.apache.pulsar.io.jdbc.JdbcAbstractSink - Got exception
20:38:59.019 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:37:59.017 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:36:59.016 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:35:59.015 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:34:59.014 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:33:59.011 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:32:59.010 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:31:59.008 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 0.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
20:30:59.006 [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [market_data/deribit/skew_iv_v2] [postgres-sink] [45408] Prefetched messages: 0 --- Consume throughput received: 2.50 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
Now believe that this issue is related to using the EFFECTIVELY_ONCE
mode for the sink. The issue doesn't seem to happen when using ATLEAST_ONCE
.
Thanks @Alxander64 , Sorry for the late response, have you tried the new Pulsar version 2.7.0? or 2.6.3? If the problem still there, we need to fix it ASAP
I have recently updated to 2.6.3, but since then I've only been running sinks on a more stable database. I first noticed this issue when sinking to ClickHouse, which I didn't have a great production setup for.
For a simple test, I had my Pulsar cluster in k8s and brought up a singe Postgres replica with a Helm chart. I had a sink running configured like how I described above, and then I just deleted the pod running Postgres and waited for it to respawn. If new rows don't eventually populate in the table being sinked to, then the problem persists.
Describe the bug When running a ClickHouse JDBC Sink, and encountering some error from the database (e.g. timeout), the sinks seems to continue consuming, but not actually insert or ack any further messages.
To Reproduce Steps to reproduce the behavior:
Expected behavior The sink should recover, and be able to continue inserting and acking messages.
Logs
In the logs you can see that the sink logs the regular update, shows the error from having the connection refused by ClickHouse (for now this happens when we have a restart), and the regular updates are being logged again, similar to how they were before.
Screenshots
In this screenshot you can see how there was a point where the backlog was accumulating. This was one instance of this error affecting the sink. Then the backlog comes back down after I manually restarted the sink from the CLI, which had the sink running properly again. And then later, another instance of this error occurred, and the backlog begins to accumulate again.
Additional context Mentioned in the steps to reproduce:
Ideas
My working theory is that there's either something wrong logically with the JDBC sinks, where they somehow don't work properly after encountering some error from the database.
Or, there is something wrong more specifically with the ClickHouse JDBC driver being used, and it doesn't handle errors correctly.
I have not tested this with any other databases, but I imagine a quick test with either PostgreSQL or MySQL may reveal if this is a general issue with the JDBC sinks or not.