gocd / gocd-database-migrator

Migrating GoCD data between H2 / PostgreSQL / MySQL.
https://www.gocd.org
Apache License 2.0
2 stars 5 forks source link

Always can not migrate the full table MODIFIEDFILES #76

Closed xaidu closed 1 year ago

xaidu commented 1 year ago

GOCD Server : 20.4

Try migrating the DB H2 => Postgresql it always says Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records

chadlwilson commented 1 year ago

Which of those is the correct number, from your perspective?

Please share the fuller output, version you are using, postgresql version etc.

xaidu commented 1 year ago

May I confirm what happened when it did not finish the copy, but the time is already over 1000 SECONDS. I saw the method doExport will do termination after wait the 1000 seconds.

  private void doExport(BasicDataSource sourceDataSource, DataSource targetDataSource, Writer writer) {
        Map<String, Integer> tables = listTables(sourceDataSource);
        LOG.info("Found tables:");

        tables.forEach((tableName, recordCount) -> {
            LOG.info("  {}: {} records", tableName, recordCount);
        });

        try (ProgressBar progressBar = progressBar(tables)) {
            ThreadPoolExecutor executor = new ThreadPoolExecutor(args.threads, args.threads, 1L, TimeUnit.SECONDS, new LinkedQueue<>(2));
            try {
                tables.forEach((String tableName, Integer rowCount) -> {
                    executor.execute(() -> {
                        try (Connection sourceConnection = sourceDataSource.getConnection()) {
                            dumpTableSQL(tableName, rowCount, sourceConnection, targetDataSource, writer, progressBar);
                        } catch (Exception e) {
                            LOG.error(null, e);
                            throw new RuntimeException(e);
                        }
                    });
                });
            } finally {
                LOG.debug("Shutting down thread pool executor");
                executor.shutdown();
                LOG.debug("Awaiting termination of executorService");
                try {
                    executor.awaitTermination(1000, TimeUnit.SECONDS);
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        }

    }
xaidu commented 1 year ago

PostgreSQL version 12. the following out put in the console log.

2023-07-14 18:57:30,196 INFO  [main] DbSync:194 -   MATERIALS: 2683 records
record copy progress  45% [ ] 23455678/66789021 record (0:26:20 / 0:48:42) 1549
2023-07-14 19:23:51,324 INFO  [main] DbSync:123 - Done copying database records.
2023-07-14 19:23:51,324 INFO  [main] DbSync:128 - Setting sequences for all tables.
2023-07-14 19:23:51,480 INFO  [main] DbSync:130 - Done setting sequences for all tables.
2023-07-14 19:23:51,480 INFO  [main] DbSync:134 - Initializing database indices and constraints on target database. This may take several minutes, depending on the size of the database.
2023-07-14 19:31:32,044 INFO  [main] DbSync:136 - Done initializing database indices and constraints on target database.
2023-07-14 19:31:32,044 INFO  [main] DbSync:145 - Verifying if number of records are identical in source and target.
2023-07-14 19:31:34,704 ERROR [main] DbSync:158 - It appears that there was a problem copying records:
2023-07-14 19:31:34,704 ERROR [main] DbSync:160 -   Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records
chadlwilson commented 1 year ago

I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.

xaidu commented 1 year ago

I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.

Can you try a bigger DB which should use more than 1 hour to do the migration? I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.

as I asked earlier, please share the specific migrator version you are using

which Java version are you running with?

Java 17

what command line args are you running with (can exclude credentials)

./bin/gocd-database-migrator --insert --progress --source-db-url="jdbc:h2: lib\db\h2db\cruise" --source-db-user='' --source-db-password='' --target-db-url='jdbc:postgresql://localhost:5432/gocd' --target-db-user='username' --target-db-password='password'

is the result and # of rows exactly the same if you change the batch-size (bigger? smaller?) or the threads (fewer? more?)

NO, the batch size and threads do not help.

chadlwilson commented 1 year ago

OK, that seems odd re: the timeout getting hit and swallowing the error, but perhaps you are right - I'm not really sure why though. I guess that'd be a separate issue/bug to your core problem:

File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:

Anyway it looks like your source H2 DB is corrupted in some way within the MODIFIEDFILES table. That error is coming from the H2 DB driver.

You are probably going to have to try and use H2 database tools to recover the database and then retry the migration on a recovered database.

Be careful to keep your original H2DB source file backed up and untouched. There should be no need to replace it, the "recovered" DB file can be used just for the migration to GoCD 20.5.0.

chadlwilson commented 1 year ago

I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.

What timeout did you set when you adjusted it? Was it 3000 seconds, or something larger?

Another thought, in case the corruption is actually misleading, and it's caused by an error during timeout (the export getting interrupted), maybe we can try a really large number and see what happens?

I've published https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.3-140 which increases the timeout to 90 minutes, as well as trying to improve the handling and behaviour when it hits such a timeout to avoid misleading exceptions being logged (it will now fail and not continue if it hits the timeout so should fix your original issue).

It probably needs more work to avoid proceeding to next steps if there are errors during the copy though, this doesn't seem ideal.

xaidu commented 1 year ago

What timeout did you set when you adjusted it? Was it 3000 seconds, or something larger?

Yeah, I changed it to 100000 seconds. And after I use a new backup DB, It can migrate successfully. I think it would be better if it can be a config set by the user. And Many thanks @chadlwilson

chadlwilson commented 1 year ago

That's good to hear - did you use the steps in https://gist.github.com/arvindsv/c7934bee26ce1c47e9d24f0a493d3c87 to do the backup/restore, or some other technique?

chadlwilson commented 1 year ago

--export-timeout made configurable in https://github.com/gocd/gocd-database-migrator/commit/7ce3921425af6b9d3045021b7540136eae6be09f, released in https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.4-145