darold / ora2pg

Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema. It connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into PostgreSQL.
http://www.ora2pg.com/
GNU General Public License v3.0
991 stars 342 forks source link

The "-t TEST_COUNT" is not properly using the real Oracle row counts when comparing #1617

Closed simonpane closed 1 year ago

simonpane commented 1 year ago

When running using the --type TEST_COUNT option, the real row counts are being collected, but are not actually being used in the row count comparisons due to a hash update bug. Instead the num_rows from the Oracle optimizer statistics is being used and compared and is sometimes leading to false comparison results.

Issue is due to several small bugs in the _table_info subroutine.

Relevant set parameters from ora2pg.conf just for reference:

CREATE_SCHEMA                  0
EXPORT_SCHEMA                  0
FORCE_OWNER                    1
SCHEMA                         MY_SCHEMA
TYPE                           TABLE

For debugging purposes and to illustrate the problem & solution, added Data::Dumper and dumped the %tables_infos hash.

To illustrate the problem, a test table (called "MY_SCHEMA"."my_table") with 13 rows is used. But the Oracle optimizer statistics has recorded only 12 rows in the catalog.

Initially when run for the test table, a multidimensional hash with data from the Oracle catalog is populated by the Ora2Pg::Oracle::_table_info subroutine and the output is:

          'my_table' => {
                          'owner' => 'MY_SCHEMA',
                          'duration' => undef,
                          'num_rows' => '12',
                          'nested' => 'NO',
                          'temporary' => 'N',
                          'partitioned' => 0,
                          'comment' => '',
                          'type' => '',
                          'nologging' => 0,
                          'tablespace' => 'USERS'
                        },

Then, the output of _count_source_rows (the "real" row counts) is processed by _table_info in Ora2Pg.pm in the following code block:

        foreach my $s (@ret)
        {
            my ($tb, $cnt) = split(':', $s);
            $tables_infos{$tb}{num_rows} = $cnt || 0;
        }

The problem is that the row format in the file ora2pg_count_rows is:

"MY_SCHEMA"."my_table":13

Thus when the above code excerpt runs, and parses the output shown above, it fails to match the hash key and instead creates another hash key/value which looks like:

          '"MY_SCHEMA"."my_table"' => {
                                               'num_rows' => '13
'
                                             },

Since, as it stands currently, the updated num_rows from the real row count (from _table_row_count) is not being used/found when comparing to the PostgreSQL row count, a false positive result is returned:

Table my_table doesn't have the same number of line in source database (12) and in PostgreSQL (13).

This output shows that it it finding and comparing the Oracle optimizer statistics row count of 12 and not the real Oracle row count of 13.

The issue with the extra carriage return can be easily fixed using chomp on the $cnt variable. But more significantly, the hash key search needs to be updated to properly find and update the correct hash value.


Fix for this issue has been developed and is currently being tested. PR with fix is forthcoming.