bersler / OpenLogReplicator

Open Source Oracle database CDC
https://www.bersler.com
GNU General Public License v3.0
291 stars 133 forks source link

OLR loses connection with Debezium #184

Open alexisread opened 1 month ago

alexisread commented 1 month ago

A brief description of the bug. OLR connects to Debezium successfully, and the current SCN is exchanged. On a table insert, OLR loses the connection with Debezium. Debezium error: deboracle | [2024-10-21 12:37:55,539] INFO OpenLogReplicator ready, streaming from SCN 464082622521. (io.debezium.connector.oracle.olr.client.OlrNetworkClient) deboracle | [2024-10-21 12:37:56,567] INFO 172.20.0.4 - - [21/Oct/2024:12:37:56 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer) deboracle | [2024-10-21 12:37:57,768] INFO OpenLogReplicator streaming client started successfully. (io.debezium.connector.oracle.olr.client.OlrNetworkClient) deboracle | [2024-10-21 12:37:58,649] INFO WorkerSourceTask{id=Q1COTRX_OWNER_REFDB_INSTRUMENT-0} Committing offsets for 31 acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask) deboracle | [2024-10-21 12:37:58,651] WARN Cannot flush latest offset SCN as no checkpoint event was received. (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource) deboracle | [2024-10-21 12:38:01,697] INFO 172.20.0.4 - - [21/Oct/2024:12:38:01 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 2 (org.apache.kafka.connect.runtime.rest.RestServer) deboracle | [2024-10-21 12:38:06,880] INFO 172.20.0.4 - - [21/Oct/2024:12:38:06 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer) deboracle | [2024-10-21 12:38:12,000] INFO 172.20.0.4 - - [21/Oct/2024:12:38:11 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer) deboracle | [2024-10-21 12:38:13,743] INFO Client disconnected. (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource) deboracle | [2024-10-21 12:38:13,745] ERROR Failed: Connection lost (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource) deboracle | io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143) deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) deboracle | at java.base/java.lang.Thread.run(Thread.java:829) deboracle | [2024-10-21 12:38:13,748] ERROR Producer failure (io.debezium.pipeline.ErrorHandler) deboracle | io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143) deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) deboracle | at java.base/java.lang.Thread.run(Thread.java:829) deboracle | [2024-10-21 12:38:13,749] INFO Streaming metrics dump: io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSourceMetrics@73d37d04 (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource) deboracle | [2024-10-21 12:38:13,749] INFO Offsets: OracleOffsetContext [scn=464082622521, commit_scn=[], lcr_position=null] (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource) deboracle | [2024-10-21 12:38:13,749] INFO Finished streaming (io.debezium.pipeline.ChangeEventSourceCoordinator) deboracle | [2024-10-21 12:38:13,749] INFO Connected metrics set to 'false' (io.debezium.pipeline.ChangeEventSourceCoordinator) deboracle | [2024-10-21 12:38:14,429] ERROR WorkerSourceTask{id=Q1COTRX_OWNER_REFDB_INSTRUMENT-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask) deboracle | org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped. deboracle | at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:67) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:149) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203) deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143) deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) deboracle | at java.base/java.lang.Thread.run(Thread.java:829) deboracle | Caused by: io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145) deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126) deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120) deboracle | ... 9 more deboracle | [2024-10-21 12:38:14,430] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask) deboracle | [2024-10-21 12:38:14,432] INFO SignalProcessor stopped (io.debezium.pipeline.signal.SignalProcessor) deboracle | [2024-10-21 12:38:14,433] INFO Debezium ServiceRegistry stopped. (io.debezium.service.DefaultServiceRegistry) deboracle | [2024-10-21 12:38:14,537] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection) deboracle | [2024-10-21 12:38:14,642] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection) deboracle | [2024-10-21 12:38:14,643] INFO [Producer clientId=Q1COTRX-schemahistory] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer) deboracle | [2024-10-21 12:38:14,649] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,649] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,649] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,649] INFO App info kafka.producer for Q1COTRX-schemahistory unregistered (org.apache.kafka.common.utils.AppInfoParser) deboracle | [2024-10-21 12:38:14,650] INFO [Producer clientId=connector-producer-Q1COTRX_OWNER_REFDB_INSTRUMENT-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer) deboracle | [2024-10-21 12:38:14,656] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,656] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,656] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) deboracle | [2024-10-21 12:38:14,656] INFO App info kafka.producer for connector-producer-Q1COTRX_OWNER_REFDB_INSTRUMENT-0 unregistered (org.apache.kafka.common.utils.AppInfoParser) deboracle | [2024-10-21 12:38:17,166] INFO 172.20.0.4 - - [21/Oct/2024:12:38:17 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 2 (org.apache.kafka.connect.runtime.rest.RestServer)

OLR output: [ec2-user@ip-10-72-3-42 OpenLogReplicator]$ ./OpenLogReplicator 2024-10-21 12:35:15 INFO 00000 OpenLogReplicator v1.7.0 (C) 2018-2024 by Adam Leszczynski (aleszczynski@bersler.com), see LICENSE file for licensing information 2024-10-21 12:35:15 INFO 00000 arch: x86_64, build-arch: , system: Linux, release: 4.18.0-513.18.1.el8_9.x86_64, build: Debug, compiled: 2024-10-15 12:01, modules: Kafka OCI Protobuf 2024-10-21 12:35:15 INFO 00000 adding source: S1 2024-10-21 12:35:15 INFO 00000 adding target: T1 2024-10-21 12:35:15 INFO 00000 writer is starting with Writer: Network:0.0.0.0:5000 2024-10-21 12:35:15 WARN 10003 file: checkpoint/DB1-chkpt.json - get metadata returned: No such file or directory 2024-10-21 12:37:56 INFO 00000 client requested to start from scn: 464082622521 2024-10-21 12:37:56 INFO 00000 connecting to Oracle instance of DB1 to //0.0.0.0:1521/dpcotest.ec2.internal 2024-10-21 12:37:56 INFO 00000 version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production, context: dpcotest, resetlogs: 0, activation: 0, con_id: 0, con_name: dpcotest 2024-10-21 12:37:56 INFO 00000 loading character mapping for AL32UTF8 2024-10-21 12:37:56 INFO 00000 loading character mapping for AL16UTF16 2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_1m0vkkrnp.log - get metadata returned: No such file or directory 2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_1m0vkkrkt.log 2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_2m0vkkro7.log - get metadata returned: No such file or directory 2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_2m0vkkrlc.log 2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_3m0vkkrom.log - get metadata returned: No such file or directory 2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_3m0vkkrln.log 2024-10-21 12:37:56 INFO 00000 timezone: +00:00, db-timezone: +00:00, log-timezone: +00:00, host-timezone: +00:00 2024-10-21 12:37:56 INFO 00000 Oracle Replicator for DB1 in online mode is starting (flags: 4096) from scn: 464082622521 2024-10-21 12:37:56 INFO 00000 current resetlogs is: 1165353144 2024-10-21 12:37:56 INFO 00000 first data SCN: 464082622521 2024-10-21 12:37:56 INFO 00000 starting sequence not found - starting with new batch with seq: 5178 2024-10-21 12:37:56 INFO 00000 reading dictionaries for scn: 464082622521 2024-10-21 12:37:58 INFO 00000 - found: SYS.CCOL$ (dataobj: 29, obj: 32, columns: 11, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.CDEF$ (dataobj: 29, obj: 31, columns: 21, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.COL$ (dataobj: 2, obj: 21, columns: 37, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.DEFERRED_STG$ (dataobj: 81, obj: 81, columns: 25, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.ECOL$ (dataobj: 152, obj: 152, columns: 4, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.LOB$ (dataobj: 2, obj: 109, columns: 18, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.LOBCOMPPART$ (dataobj: 850, obj: 850, columns: 24, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.LOBFRAG$ (dataobj: 847, obj: 847, columns: 16, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.OBJ$ (dataobj: 18, obj: 18, columns: 37, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.TAB$ (dataobj: 2, obj: 4, columns: 45, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.TABPART$ (dataobj: 814, obj: 814, columns: 26, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.TABCOMPART$ (dataobj: 836, obj: 836, columns: 35, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.TABSUBPART$ (dataobj: 826, obj: 826, columns: 26, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: SYS.TS$ (dataobj: 6, obj: 16, columns: 32, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: SYS.USER$ (dataobj: 10, obj: 22, columns: 30, lobs: 0, lob-idx: 0), part of cluster 2024-10-21 12:37:58 INFO 00000 - found: XDB.XDB$TTSET (dataobj: 21073, obj: 21073, columns: 4, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: XDB.X$NM46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21986, obj: 21986, columns: 2, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: XDB.X$PT46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21988, obj: 21988, columns: 2, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: XDB.X$QN46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21987, obj: 21987, columns: 4, lobs: 0, lob-idx: 0) 2024-10-21 12:37:58 INFO 00000 - found: OWNER_ARM.TRANSACTION_REPORT (dataobj: 0, obj: 74738, columns: 137, lobs: 6, lob-idx: 756), partitioned(table: 252, lob: 756) 2024-10-21 12:37:58 INFO 00000 last confirmed scn: 464082622521, starting sequence: 5178, offset: 0 2024-10-21 12:37:58 INFO 00000 resume writer 2024-10-21 12:37:58 INFO 00000 streaming to client 2024-10-21 12:37:58 INFO 00000 found redo log version: 19.0.0, activation: 2125574454, resetlogs: 1165353144, page: 512, sequence: 5178, SID: DPCOTEST, endian: LITTLE 2024-10-21 12:37:58 INFO 00000 processing redo log: group: 3 scn: 464082608971 to 0 seq: 5178 path: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_3m0vkkrln.log offset: 1024 2024-10-21 12:37:58 INFO 00000 new activation detected: 2125574454

Is the bug present on the latest master branch. Yes

Describe steps required to reproduce the bug. Tested with OLR running on the Oracle instance, and as a docker image with sshfs access to the redo logs via mounts at: /home/main/oracle/oradata /home/main/oracle/fast_recovery_area

service: olr: build: context: ./open_log_replicator dockerfile: DockerfileImage container_name: olr privileged: true volumes:

For the system where OLR is running on the Oracle box, the OS is RHEL8, and OLR is built using the attached build_redhat.sh script Config for both is attached as OpenLogReplicator.json and connector-product.properties Checkpoints are attached, redo logs are too big to upload.

For bugs related to Redo Log parse error where reproduction is not possible. I don't believe this to be a redo log parse error, as OLR does not give any error. This is mainly a query about an OLR-Debezium link error. olr_log.zip