darold / ora2pg

Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema. It connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into PostgreSQL.
http://www.ora2pg.com/
GNU General Public License v3.0
1.01k stars 343 forks source link

Parallel Data Load with Table Partitions is Not Working [Range] #949

Open dhruv-dat opened 4 years ago

dhruv-dat commented 4 years ago

Hi @darold

Hope you and your loved one's are safe.

We have been looking at enabling parallelism at partition level in Ora2Pg data load directly to PostgreSQL. This feature doesn't seem to be working though. Can you please let us know if this feature is currently available or this needs some work to be done?

Sample table with range partition that we have used - CREATE TABLE hr.testtable1 PARTITION BY RANGE (empl_id) ( PARTITION P1 VALUES LESS THAN (2500000), PARTITION P2 VALUES LESS THAN (5000000), PARTITION P3 VALUES LESS THAN (7500000), PARTITION P4 VALUES LESS THAN (MAXVALUE) ) AS ( SELECT LEVEL empl_id, MOD (ROWNUM, 50000) dept_id, Trunc (dbms_random.Value (1000, 500000), 2) salary, Decode (Round (dbms_random.Value (1, 2)), 1, 'M', 2, 'F') gender, To_date (Round (dbms_random.Value (1, 28)) || '-' || Round (dbms_random.Value (1, 12)) || '-' || Round (dbms_random.Value (1900, 2010)), 'DD-MM-YYYY') dob, dbms_random.String ('x', dbms_random.Value (20, 50)) address, LEVEL empl_id1, MOD (ROWNUM, 50000) dept_id1, Trunc (dbms_random.Value (1000, 500000), 2) salary1, Decode (Round (dbms_random.Value (1, 2)), 1, 'M', 2, 'F') gender1, To_date (Round (dbms_random.Value (1, 28)) || '-' || Round (dbms_random.Value (1, 12)) || '-' || Round (dbms_random.Value (1900, 2010)), 'DD-MM-YYYY') dob1, dbms_random.String ('x', dbms_random.Value (20, 50)) address1, LEVEL empl_id2, MOD (ROWNUM, 50000) dept_id2, Trunc (dbms_random.Value (1000, 500000), 2) salary2, Decode (Round (dbms_random.Value (1, 2)), 1, 'M', 2, 'F') gender2, To_date (Round (dbms_random.Value (1, 28)) || '-' || Round (dbms_random.Value (1, 12)) || '-' || Round (dbms_random.Value (1900, 2010)), 'DD-MM-YYYY') dob2, dbms_random.String ('x', dbms_random.Value (20, 50)) address2, LEVEL empl_id3, MOD (ROWNUM, 50000) dept_id3, Trunc (dbms_random.Value (1000, 500000), 2) salary3, Decode (Round (dbms_random.Value (1, 2)), 1, 'M', 2, 'F') gender3, To_date (Round (dbms_random.Value (1, 28)) || '-' || Round (dbms_random.Value (1, 12)) || '-' || Round (dbms_random.Value (1900, 2010)), 'DD-MM-YYYY') dob3, dbms_random.String ('x', dbms_random.Value (20, 50)) address3 FROM dual CONNECT BY LEVEL < 10000000);

We have been using the following Ora2Pg command, ours is 20 vCore machine with 130 GB Memory. - ora2pg -c /etc/ora2pg/ora2pg.conf -t COPY --namespace HR -a 'TABLE[testtable]' -J 5 -j 3 -L 25000 -d --oracle_speed --ora2pg_speed

As you can see the following, only one partition is loading at a time. Can we do something to invoke them all in parallel? [========================>] 1/1 tables (100.0%) end of scanning. [======> ] 25000/99999 rows (25.0%) Table P3 (3 sec., 8333 recs/sec) [======> ] 25000/99999 rows (25.0%) Table P2 (2 sec., 12500 recs/sec) [======> ] 25000/99999 rows (25.0%) Table P1 (2 sec., 12500 recs/sec) [=====> ] 24999/99999 rows (25.0%) Table P4 (3 sec., 8333 recs/sec) [========================>] 99999/99999 rows (100.0%) on total estimated data (10 sec., avg: 9999 tuples/sec)

Could you also please let us know some best performance indicators that would help migrate relatively large volumes of data.

Thanks

darold commented 4 years ago

Unfortunately I do not have a lot of spare time in 2019-2020, I will have some holidays in September but I'm not sure that this topic will be in top priority.