bersler / OpenLogReplicator

Open Source Oracle database CDC
https://www.bersler.com
GNU General Public License v3.0
285 stars 133 forks source link

Occasional, unskippable error: too short field supplemental log: 8 Error #162

Closed mitchellmann closed 3 months ago

mitchellmann commented 3 months ago

A brief description of the bug. We are using OLR to parse logs from an Oracle 12c 12.1 instance. OLR is successful in parsing many of the logs, but seems to encounter the following error reliably.

2024-07-17 15:39:04 ERROR 50061 runtime error, aborting further redo log processing: too short field supplemental log: 8 offset: 2556652

In all instances of this occurring, the referenced offset has hex: 05 01 1A 00 03 00 FF FF

Is the bug present on the latest master branch. We are using the latest build.

Describe steps required to reproduce the bug. The above byte values are not the first instance to occur in the redo or archive file. In many cases, those values have occurred 10+ times in the file before the issue occurs.

An example full block that has the issue is: unsigned char ucDataBlock[512] = { // Offset 0x00010400 to 0x000105FF 0x01, 0x22, 0x00, 0x00, 0x82, 0x00, 0x00, 0x00, 0x24, 0x1E, 0x00, 0x00, 0x40, 0x80, 0x44, 0x85, 0xC1, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0xC0, 0x00, 0x03, 0x00, 0x6E, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x10, 0x00, 0x01, 0x00, 0x00, 0x00, 0xC2, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x08, 0x00, 0x1B, 0x00, 0x3B, 0x97, 0x05, 0x00, 0x02, 0x00, 0x10, 0x0A, 0xB4, 0x00, 0x00, 0x00, 0x01, 0x00, 0x10, 0x0A, 0xC4, 0x71, 0xB9, 0x9C, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0xFF, 0xFF, 0x05, 0x04, 0x15, 0x00, 0x03, 0x00, 0xFF, 0xFF, 0xA0, 0x00, 0xC0, 0x00, 0xAC, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x01, 0x00, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x14, 0x00, 0x10, 0x00, 0x04, 0x00, 0x19, 0x00, 0x00, 0x00, 0xF5, 0x9B, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x1E, 0x2A, 0xC1, 0x00, 0xBF, 0x46, 0x06, 0x00, 0x3D, 0x00, 0xEA, 0x15, 0x01, 0x00, 0x00, 0x00, 0xC5, 0xF4, 0x8E, 0x66, 0x18, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xB4, 0x00, 0x00, 0x06, 0xB6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x10, 0x00, 0x04, 0x00, 0x02, 0x00, 0x08, 0x00, 0xC0, 0x00, 0x28, 0x23, 0x00, 0x00, 0x03, 0x00, 0x19, 0x00, 0xF5, 0x9B, 0x05, 0x00, 0xFF, 0x00, 0x0E, 0x00, 0x01, 0x0A, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0xAB, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x01, 0x4C, 0x10, 0x0A, 0xC5, 0x71, 0xB9, 0x9C, 0x02, 0x00, 0x00, 0xBF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x02, 0x19, 0x00, 0x03, 0x00, 0xFF, 0xFF, 0xC0, 0x00, 0xC0, 0x00, 0xAE, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x01, 0x00, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x20, 0x00, 0x17, 0x00, 0x00, 0x00, 0xC5, 0xA7, 0x05, 0x00, 0x21, 0x26, 0xC1, 0x00, 0x65, 0x49, 0x10, 0x00, 0x12, 0x00, 0xA8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x01, 0x1A, 0x00, 0x03, 0x00, 0xFF, 0xFF, 0x21, 0x26, 0xC1, 0x00, 0xAD, 0x71, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x01, 0x00, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0E, 0x00, 0x14, 0x00, 0x4C, 0x00, 0x20, 0x00, 0x14, 0x00, 0x08, 0x00, 0x14, 0x00, 0x00, 0x00, 0xA8, 0x00, 0xA8, 0x0D, 0x12, 0x00, 0x00, 0x00, 0x05, 0x00, 0x17, 0x00, 0xC5, 0xA7, 0x05, 0x00, 0x65, 0x49, 0x10, 0x00, 0xE7, 0xA1, 0x01, 0x00, 0xE7, 0xA1, 0x01, 0x00, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0B, 0x01, 0x17, 0x00, 0x08, 0x0C, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x21, 0x26, 0xC1, 0x00, 0x65, 0x49, 0x0F, 0x00, 0xEC, 0x6F, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0xED, 0x6F, 0xB9, 0x9C, 0x10, 0x0A, 0x00, 0x00, 0x80, 0x02, 0x34, 0xD3, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x34, 0xD3, 0x1A, 0x26, 0xC1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x0D, 0xC1, 0x00, 0xDB, 0x49, 0x06, 0x00, 0x04, 0x00, 0x04, 0x00, 0xD7, 0x99, 0x05, 0x00, 0x9A, 0x2A, 0xC1, 0x00, 0xBE, 0x48, 0x07, 0x00, 0x00, 0x80, 0x10, 0x0A, 0xA7, 0x71, 0xB9, 0x9C };

I can provide more data pieces, but am limited in ability to provide the entire redo file as it's external data.

bersler commented 3 months ago

Hi @mitchellmann , please describe me what is your idea to:

  1. understand what has actually happened
  2. reproduce this problem
  3. write proper implementation how to handle it
  4. make tests that verify proper implementation based on the information above.

What would be next step to do?

There are 2 options:

  1. create a reproduction that I can run to get the same results to do the above
  2. provide redo logs with some data (perhaps an NDA can be signed if this contains sensitive data) - with the redo logs I can alter system dump log file to see the interpretation of Oracle of that data - definitely I can't do that with some dump of one block from the middle of the redo log

If neither of those can be achieved, we can close the issue.

Regards, Adam

mitchellmann commented 3 months ago

Hi @bersler,

I didn't realize Oracle had the ALTER SYSTEM DUMP LOGFILE capability. I think that helps me get unstuck. I'm going to run with that and will circle back with either a resolution or at least more information. Thank you.

bersler commented 3 months ago

It does not print any information about supplemental log anyway.

bersler commented 3 months ago

Any chance for a reproduction? Of not then I will close the issue and we can reopen it when somebody has the same issue and can provide reproduction.

mitchellmann commented 3 months ago

Hi @bersler Was able to reproduce this. I've run one of the logs in question through a log dump. Both REDO RECORDS of the block in question are attached.

I'll note, I had it running just against sys and a single user's tables. This was stable, but action on another schema seems to cause this issue. REDO RECORDS.txt

I'm going to continue some different types of tables to try and further eliminate issue, but wanted to provide this in the interim.

bersler commented 3 months ago

Problem reproduction missing.

mitchellmann commented 3 months ago

Agree with closing at this point. Thanks @bersler

For those that find this later: I've narrowed down the issue to a subset of tables within the database. It's unclear why the supplemental logging has an issue with these few tables. If I find a clear answer, I'll update.

bersler commented 3 months ago

Hi @mitchellmann, please understand that all changes that go to the code must have justification with real test data. If I don't have test data I will not change a single byte. I must be able to verify with all further changes of the code that still this test data is parsed correctly. If you find a solution please go ahead with a fork of the project. Good luck!

mitchellmann commented 2 months ago

For those who find this later. This issue was a database trigger. Most triggers seem ok, but issues seem to arise on: BEFORE [INSERT, UPDATE, DELETE] FOR EACH ROW triggers.

"Before" OR "for each row" is ok, but not together.

@bersler is this expected? I can't find anything in the repo.

bersler commented 2 months ago

Thanks @mitchellmann for feedback. Great work!

Can you provide an example reproduction script which shows the error on a simple XE Oracle docker image? I mean for example - tak OpenLogReplicator-docker image, run it, execute some exact provided SQL and - voila - OLR stops. This is exactly what is required to write a fix. So that somebody who is working on the fix would not need to check all possible configuration parameters and command combinations to actually reproduce the same problem.

Cheers, Adam