darold / ora2pg

Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema. It connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into PostgreSQL.
http://www.ora2pg.com/
GNU General Public License v3.0
978 stars 341 forks source link

Update existing rows with "ON CONFLICT DO UPDATE" ? #1680

Closed newcepco closed 8 months ago

newcepco commented 11 months ago

Hello to you, wonderful people. I'm really hesitant whether we're using ora2pg right. We have a huge oracle database (1.5tb), and cannot allow for long downtime (full migration takes up to 3 days).

So we've added an UPDATED_AT column to each table, so we could do migration in two steps: first step while application is running (main.conf) and then while application is down (sync.conf). Out configurations look something like this:

main.conf WHERE UPDATED_AT <= timestamp '2023-08-11 15:00:00'

sync.conf WHERE UPDATED_AT > timestamp '2023-08-11 15:00:00'

Idea being, that after big chunk of database if transferred, we can put application down, and transfer the rest of it fairly quickly.

But the problem is that some of the data can be updated in between step 1 and step 2. So for example entry that satisfies condition

UPDATED_AT <= timestamp '2023-08-11 15:00:00'

can satisfy condition UPDATED_AT > timestamp '2023-08-11 15:00:00'

when step 2 is about to proceed. Therefore, we get unique constraint violation.

Could there be a way for ora2pg to do UPSERT on such entries, meaning instead of INSERT, do INSERT ... ON CONFLICT UPDATE?

mgerhardy commented 10 months ago

We are using a standby database that we disconnect at some point in time and do the full backup - then we use the continuous logmining to get the statements from the last system change number (SCN) to insert the most recent changes into the postgres database, too.

We hope to not run into any problems with foreign keys with this.

It would maybe be nice to collect some typo of documentation regarding the migration strategies that others used already with ora2pg and that either went good or ... not good ;)

darold commented 10 months ago

Recently configuration directive INSERT_ON_CONFLICT have been added to ora2pg.conf, just enable it will add an ON CONFLICT DO NOTHING clause to all INSERT statements generated. Unfortunately there is no such option for ON CONFLICT UPDATE because it requires to explicitly name target columns.

Normally if you have enough CPU and memory to parallelize ora2pg (see -P, -j and -J options) migrating 1.5TB should not exceed 15 hours in the worst case. Using oracle_fdw with Ora2Pg (see FDW_SERVER configuration) should increase the speed even more.

A very basic approach is to migrate "dead" data first, then during the downtime you only migrate the "live" data. But, if you absolutely need to reduce the downtime to the minimum, @mgerhardy's approach is the right one.

newcepco commented 10 months ago

we use the continuous logmining to get the statements from the last system change number (SCN) to insert the most recent changes

Thanks for the reply! Could you, please, elaborate - why would you not use ora2pg to get your target database up-to-date, but logmining redo logs ?

I assume logminer logs are raw oracle sql transactions, that are not necessarily PostgreSQL compliant. How are you planning to update your previously imported ("dead" data) records ?

mgerhardy commented 10 months ago

I don't think there is a native way with ora2pg to get the latest changes since some SCN - and we have to make sure to get every single statement (due to auditing reasons) that was executed on the database - even if the data is not the recent data.

The statements that comes out of the logmining will be converted by ora2pg to get them into the postgres database.