bersler / OpenLogReplicator

Open Source Oracle database CDC
https://www.bersler.com
GNU General Public License v3.0
280 stars 131 forks source link

abnormal behavior in batch mode #116

Closed psalms945 closed 1 month ago

psalms945 commented 6 months ago

A brief description of the bug.

In my case, I run openlog replicator in batch mode for each of archived redo logs. Openlog replicator would produce new chkpt-scnxxx.json file and the size of new file is bigger than last run, and it cause the size of chkpt-scnxxx.json might be up to GB scale, then the performance of OLR is upgraded due to big size of chkpt-scnxxx.json.

Is the bug present on the latest master branch. Yes

Describe steps required to reproduce the bug.

  1. Build Openlog Replicator by https://github.com/bersler/OpenLogReplicator-tutorials
  2. Run step 1~5 by https://github.com/bersler/OpenLogReplicator-tutorials offline example (Disable docker start ${OLR_CONTAINER} in step 4)
  3. Run SQL to archive logs in ORA2 ALTER SESSION SET CONTAINER = CDB$ROOT; ALTER SYSTEM ARCHIVE LOG CURRENT;
  4. Check the path of archived redo log(o1_mf_1_5lyj59ron.arc) and run Openlog Replicator with the following configuration { "version": "1.5.0", "log-level": 3, "trace": 0, "source": [ { "alias": "S1", "name": "ORA2", "reader": { "type": "batch", "redo-log": ["/opt/fra/XE/archivelog/2024_03_06/o1_mf_1_5_lyj59ron_.arc"] }, "format": { "type": "json" }, "filter": { "table": [ {"owner": "USR2", "table": "ADAM.*"} ] } } ], "target": [ { "alias": "T1", "source": "S1", "writer": { "type": "file", "output": "/opt/output/results.txt" } } ] }

Success to prase CDC records from redo logs

  1. Insert some records to table
  2. Run SQL to archive logs in ORA2
  3. Check the path of archived redo log(o1_mf_1_6lyj5g2x4.arc) and run Openlog Replicator with the following configuration { "version": "1.5.0", "log-level": 3, "trace": 0, "source": [ { "alias": "S1", "name": "ORA2", "reader": { "type": "batch", "redo-log": ["/opt/fra/XE/archivelog/2024_03_06/o1_mf_1_6_lyj5g2x4_.arc"] }, "format": { "type": "json" }, "filter": { "table": [ {"owner": "USR2", "table": "ADAM.*"} ] } } ], "target": [ { "alias": "T1", "source": "S1", "writer": { "type": "file", "output": "/opt/output/results.txt" } } ] } At this step, got warning from logsWARN 60027 couldn't find archive log for seq: 5, found: 6, sleeping 10000000 us, and saw the value of seq in ORA2-chkpt-2625606.json be set to 4294967295 from 5
  4. Re-run Openlog Replicator again At this step, got ERROR 50023 duplicate SYS.COL$ value: (rowid: AAAAACAABAAAACRAAl) and a new ORA2-chkpt-2627212.json is created with seq 6 and the values of online-redo and incarnations are bigger than last run. Ex: "online-redo":[ {"group":1,"path":["/opt/oracle/oradata/XE/redo01.log"]}, {"group":2,"path":["/opt/oracle/oradata/XE/redo02.log"]}, {"group":3,"path":["/opt/oracle/oradata/XE/redo03.log"]}, {"group":2,"path":["/opt/oracle/oradata/XE/redo02.log"]}, {"group":1,"path":["/opt/oracle/oradata/XE/redo01.log"]}, {"group":2,"path":["/opt/oracle/oradata/XE/redo02.log"]}, {"group":1,"path":["/opt/oracle/oradata/XE/redo01.log"]}, {"group":3,"path":["/opt/oracle/oradata/XE/redo03.log","/opt/oracle/oradata/XE/redo03.log"]}], "incarnations":[ {"incarnation":1,"resetlogs-scn":1,"prior-resetlogs-scn":0,"status":"PARENT","resetlogs":1080860703,"prior-incarnation":0}, {"incarnation":2,"resetlogs-scn":2571212,"prior-resetlogs-scn":1,"status":"CURRENT","resetlogs":1162882501,"prior-incarnation":1}, {"incarnation":1,"resetlogs-scn":1,"prior-resetlogs-scn":0,"status":"PARENT","resetlogs":1080860703,"prior-incarnation":0}, {"incarnation":2,"resetlogs-scn":2571212,"prior-resetlogs-scn":1,"status":"CURRENT","resetlogs":1162882501,"prior-incarnation":1}, {"incarnation":2,"resetlogs-scn":2571212,"prior-resetlogs-scn":1,"status":"CURRENT","resetlogs":1162882501,"prior-incarnation":1}, {"incarnation":1,"resetlogs-scn":1,"prior-resetlogs-scn":0,"status":"PARENT","resetlogs":1080860703,"prior-incarnation":0}], But it is success to prase CDC records from redo logs in this time

You can find full logs in OpenLogReplicator.log For bugs related to Redo Log parse error where reproduction is not possible. This is for cases, where the fault is related to actual redo log data parsing. There is some redo log file which is causing the error, but it is not known which actual combination of SQL commands caused the error. To make a fix possible, the redo log file is required (even if the schema file or checkpoint file is not available). Please provide the redo log file.

Provide the following information:

  1. OLR configuration file (necessary)
  2. Set of redo log files which contain error (necessary)
  3. OLR checkpoint file set (helpful, but not crucial)
  4. Information about expected error

Additional context Add any other context or screenshots about the feature request here.

psalms945 commented 1 month ago

Hi, I'm still encountering this issue. Is there any step I can do for the error "ERROR 50023 duplicate SYS.COL$ value: (rowid: AAAAACAABAAAACRAAl)"? Or do you know the meaning of ERROR 50023??

bersler commented 1 month ago

Hi @AidenPTHuang , yes, you can do something. You can for example make donation to the author to increase the priority for the fix.

bersler commented 1 month ago

Hi @AidenPTHuang , I have divided the topic of slow reading/writing of checkpoint files as a separate inc.

Regarding the batch mode. What I understand is that you want to get some redo log file. Process it batch mode. And then replace the redo log file and run again in batch or online mode. Thus would not work.

I have made a branch with documentation update. I have put there information about such operation. https://github.com/bersler/OpenLogReplicator/tree/batch_mode_warnings

Generally batch mode is just for single use. Prepare some config, run it and done. Whatever checkpoint files are created, they are not suitable for any other work.

If you want to have continuous replication, you should use the online of offline mode.

Running per redo log as batch is not working. First of all, if you create a batch for every redo log files, you would not process transactions which span multiple redo log files. The would just be ignored.

For batch mode, you have just some redo log files, some checkpoint which might not necessarily match the actual schema for start of the redo log and you want just to get maximum of the redo log files you can. And then stop work.

If you want to next use online or offline - you need to prepare a new config. For online or offline mode you need to delete the checkpoint files and recreate it, and reposition the start of replication.

What you might want to use is just offline mode.

If the current features for replication modes are not enough - please describe your environment and how you would like to use OLR.

Regards, Adam

psalms945 commented 1 month ago

Hi Adam

In my environment, my application can't access Oracle online redo logs, but can connect to Oracle database and get archived redo logs from a shared storage. I've tried to use online and offline mode, but failed to start OLR because of online redo logs path not found. So I used Batch mode to run, and got error as I mention above in few redo logs.

bersler commented 1 month ago

Hi @AidenPTHuang , What you want to do is to run OLR in offline mode and use option 'flags': 1. With this setting OLR would not read online redo logs. Just offline. Please check the documentation and try this configuration.