Always can not migrate the full table MODIFIEDFILES

xaidu commented 1 year ago

GOCD Server : 20.4

Try migrating the DB H2 => Postgresql it always says Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records

chadlwilson commented 1 year ago

Which of those is the correct number, from your perspective?

Please share the fuller output, version you are using, postgresql version etc.

xaidu commented 1 year ago

May I confirm what happened when it did not finish the copy, but the time is already over 1000 SECONDS. I saw the method doExport will do termination after wait the 1000 seconds.

  private void doExport(BasicDataSource sourceDataSource, DataSource targetDataSource, Writer writer) {
        Map<String, Integer> tables = listTables(sourceDataSource);
        LOG.info("Found tables:");

        tables.forEach((tableName, recordCount) -> {
            LOG.info("  {}: {} records", tableName, recordCount);
        });

        try (ProgressBar progressBar = progressBar(tables)) {
            ThreadPoolExecutor executor = new ThreadPoolExecutor(args.threads, args.threads, 1L, TimeUnit.SECONDS, new LinkedQueue<>(2));
            try {
                tables.forEach((String tableName, Integer rowCount) -> {
                    executor.execute(() -> {
                        try (Connection sourceConnection = sourceDataSource.getConnection()) {
                            dumpTableSQL(tableName, rowCount, sourceConnection, targetDataSource, writer, progressBar);
                        } catch (Exception e) {
                            LOG.error(null, e);
                            throw new RuntimeException(e);
                        }
                    });
                });
            } finally {
                LOG.debug("Shutting down thread pool executor");
                executor.shutdown();
                LOG.debug("Awaiting termination of executorService");
                try {
                    executor.awaitTermination(1000, TimeUnit.SECONDS);
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        }

    }

xaidu commented 1 year ago

PostgreSQL version 12. the following out put in the console log.

2023-07-14 18:57:30,196 INFO  [main] DbSync:194 -   MATERIALS: 2683 records
record copy progress  45% [ ] 23455678/66789021 record (0:26:20 / 0:48:42) 1549
2023-07-14 19:23:51,324 INFO  [main] DbSync:123 - Done copying database records.
2023-07-14 19:23:51,324 INFO  [main] DbSync:128 - Setting sequences for all tables.
2023-07-14 19:23:51,480 INFO  [main] DbSync:130 - Done setting sequences for all tables.
2023-07-14 19:23:51,480 INFO  [main] DbSync:134 - Initializing database indices and constraints on target database. This may take several minutes, depending on the size of the database.
2023-07-14 19:31:32,044 INFO  [main] DbSync:136 - Done initializing database indices and constraints on target database.
2023-07-14 19:31:32,044 INFO  [main] DbSync:145 - Verifying if number of records are identical in source and target.
2023-07-14 19:31:34,704 ERROR [main] DbSync:158 - It appears that there was a problem copying records:
2023-07-14 19:31:34,704 ERROR [main] DbSync:160 -   Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records

chadlwilson commented 1 year ago

I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.

Please include the full log, not just pieces you think are relevant
as I asked earlier, please share the specific migrator version you are using
which Java version are you running with?
what command line args are you running with (can exclude credentials)
is the result and # of rows exactly the same if you change the batch-size (bigger? smaller?) or the threads (fewer? more?)

xaidu commented 1 year ago

I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.

Can you try a bigger DB which should use more than 1 hour to do the migration? I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.

Please include the full log, not just pieces you think are relevant

Reading change scripts from directory 
2023-07-17 13:54:33,926 INFO  [main] DbSync:108 - Initializing database skeleton on target database.
2023-07-17 13:54:36,645 INFO  [main] DbSync:110 - Done initializing database skeleton on target database.
2023-07-17 13:54:36,645 INFO  [main] DbSync:114 - Initializing database views.
2023-07-17 13:54:37,473 INFO  [main] DbSync:116 - Done initializing database views.
2023-07-17 13:54:37,473 INFO  [main] DbSync:121 - Copying database records.
2023-07-17 13:54:37,489 INFO  [main] DbSync:191 - Found tables:
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   MODIFIEDFILES: 54714941 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   ACCESSTOKEN: 43 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   DATASHARINGSETTINGS: 1 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PIPELINESTATES: 2329 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PIPELINELABELCOUNTS: 2569 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   USAGEDATAREPORTING: 1 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   ARTIFACTPLANS: 465613 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   STAGES: 449482 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   SERVERBACKUPS: 1076 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   VERSIONINFOS: 1 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   RESOURCES: 82738 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   NOTIFICATIONFILTERS: 9 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   BUILDS: 1966459 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PIPELINES: 282969 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PIPELINEMATERIALREVISIONS: 565836 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   MODIFICATIONS: 115150 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PIPELINESELECTIONS: 148 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   PLUGINS: 0 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   JOBAGENTMETADATA: 0 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   BUILDSTATETRANSITIONS: 10728317 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   ENVIRONMENTVARIABLES: 258548 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   AGENTS: 82638 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   STAGEARTIFACTCLEANUPPROHIBITED: 0 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   USERS: 232 records
2023-07-17 13:54:37,489 INFO  [main] DbSync:194 -   MATERIALS: 2683 records
record copy progress  97% [ ] 67906842/69721783 record (0:49:50 / 0:01:19) 22712023-07-17 14:44:27,782 ERROR [pool-3-thread-1] DbSync:205 - null
org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
    at org.jooq_3.16.14.H2.debug(Unknown Source)
    at org.jooq.impl.Tools.translate(Tools.java:3104)
    at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:670)
    at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:354)
    at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:284)
    at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2843)
    at com.thoughtworks.go.dbsync.DbSync.dumpTableSQL(DbSync.java:343)
    at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:203)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
    at org.h2.message.DbException.get(DbException.java:205)
    at org.h2.message.DbException.get(DbException.java:181)
    at org.h2.pagestore.db.PageDataIndex.getPage(PageDataIndex.java:206)
    at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:232)
    at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:230)
    at org.h2.pagestore.db.PageDataLeaf.getNextPage(PageDataLeaf.java:397)
    at org.h2.pagestore.db.PageDataCursor.nextRow(PageDataCursor.java:61)
    at org.h2.pagestore.db.PageDataCursor.next(PageDataCursor.java:41)
    at org.h2.index.IndexCursor.next(IndexCursor.java:291)
    at org.h2.table.TableFilter.next(TableFilter.java:529)
    at org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1843)
    at org.h2.result.LazyResult.hasNext(LazyResult.java:101)
    at org.h2.result.LazyResult.next(LazyResult.java:60)
    at org.h2.command.dml.Select.queryFlat(Select.java:737)
    at org.h2.command.dml.Select.queryWithoutCache(Select.java:844)
    at org.h2.command.dml.Query.queryWithoutCacheLazyCheck(Query.java:201)
    at org.h2.command.dml.Query.query(Query.java:489)
    at org.h2.command.dml.Query.query(Query.java:451)
    at org.h2.command.CommandContainer.query(CommandContainer.java:285)
    at org.h2.command.Command.executeQuery(Command.java:195)
    at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:233)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
    at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219)
    at org.jooq.impl.Tools.executeStatementAndGetFirstResultSet(Tools.java:4382)
    at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:230)
    at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:340)
    ... 7 common frames omitted
record copy progress  97% [ ] 67906842/69721783 record (0:49:50 / 0:01:19) 2271
java.lang.RuntimeException: org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
    at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:206)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
    at org.jooq_3.16.14.H2.debug(Unknown Source)
    at org.jooq.impl.Tools.translate(Tools.java:3104)
    at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:670)
    at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:354)
    at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:284)
    at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2843)
    at com.thoughtworks.go.dbsync.DbSync.dumpTableSQL(DbSync.java:343)
    at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:203)
    ... 3 more
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
    at org.h2.message.DbException.get(DbException.java:205)
    at org.h2.message.DbException.get(DbException.java:181)
    at org.h2.pagestore.db.PageDataIndex.getPage(PageDataIndex.java:206)
    at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:232)
    at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:230)
    at org.h2.pagestore.db.PageDataLeaf.getNextPage(PageDataLeaf.java:397)
    at org.h2.pagestore.db.PageDataCursor.nextRow(PageDataCursor.java:61)
    at org.h2.pagestore.db.PageDataCursor.next(PageDataCursor.java:41)
    at org.h2.index.IndexCursor.next(IndexCursor.java:291)
    at org.h2.table.TableFilter.next(TableFilter.java:529)
    at org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1843)
    at org.h2.result.LazyResult.hasNext(LazyResult.java:101)
    at org.h2.result.LazyResult.next(LazyResult.java:60)
    at org.h2.command.dml.Select.queryFlat(Select.java:737)
    at org.h2.command.dml.Select.queryWithoutCache(Select.java:844)
    at org.h2.command.dml.Query.queryWithoutCacheLazyCheck(Query.java:201)
    at org.h2.command.dml.Query.query(Query.java:489)
    at org.h2.command.dml.Query.query(Query.java:451)
    at org.h2.command.CommandContainer.query(CommandContainer.java:285)
    at org.h2.command.Command.executeQuery(Command.java:195)
    at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:233)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
    at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219)
    at org.jooq.impl.Tools.executeStatementAndGetFirstResultSet(Tools.java:4382)
    at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:230)
    at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:340)
    ... 7 more
2023-07-17 14:44:27,797 INFO  [main] DbSync:123 - Done copying database records.
2023-07-17 14:44:27,828 INFO  [main] DbSync:128 - Setting sequences for all tables.
2023-07-17 14:44:28,047 INFO  [main] DbSync:130 - Done setting sequences for all tables.
2023-07-17 14:44:28,063 INFO  [main] DbSync:134 - Initializing database indices and constraints on target database. This may take several minutes, depending on the size of the database.
2023-07-17 14:54:07,774 INFO  [main] DbSync:136 - Done initializing database indices and constraints on target database.
2023-07-17 14:54:07,774 INFO  [main] DbSync:145 - Verifying if number of records are identical in source and target.
2023-07-17 14:54:13,336 ERROR [main] DbSync:158 - It appears that there was a problem copying records:
2023-07-17 14:54:13,336 ERROR [main] DbSync:160 -   Expected table MODIFIEDFILES to contain 54714941 records but contained 52900000 records

as I asked earlier, please share the specific migrator version you are using

verion : 1.01

which Java version are you running with?

Java 17

what command line args are you running with (can exclude credentials)

./bin/gocd-database-migrator --insert --progress --source-db-url="jdbc:h2: lib\db\h2db\cruise" --source-db-user='' --source-db-password='' --target-db-url='jdbc:postgresql://localhost:5432/gocd' --target-db-user='username' --target-db-password='password'

is the result and # of rows exactly the same if you change the batch-size (bigger? smaller?) or the threads (fewer? more?)

NO, the batch size and threads do not help.

chadlwilson commented 1 year ago

OK, that seems odd re: the timeout getting hit and swallowing the error, but perhaps you are right - I'm not really sure why though. I guess that'd be a separate issue/bug to your core problem:

File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:

Anyway it looks like your source H2 DB is corrupted in some way within the MODIFIEDFILES table. That error is coming from the H2 DB driver.

You are probably going to have to try and use H2 database tools to recover the database and then retry the migration on a recovered database.

I would suggest following something like https://gist.github.com/arvindsv/c7934bee26ce1c47e9d24f0a493d3c87 to create a new GoCD 20.4.0-compatible H2 database file first.
The H2 database from GoCD 20.4.0 (your corrupted source database) is based on H2 1.3.168, so I would advise you to use https://repo1.maven.org/maven2/com/h2database/h2/1.3.168/ (h2-1.3.168.jar) for your recovery attempt NOT the 1.4.200 driver that is included with the migrator tool.
after you have a new recovered database in a different file to the original you can try the migrator tool again.

Be careful to keep your original H2DB source file backed up and untouched. There should be no need to replace it, the "recovered" DB file can be used just for the migration to GoCD 20.5.0.

chadlwilson commented 1 year ago

I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.

What timeout did you set when you adjusted it? Was it 3000 seconds, or something larger?

Another thought, in case the corruption is actually misleading, and it's caused by an error during timeout (the export getting interrupted), maybe we can try a really large number and see what happens?

I've published https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.3-140 which increases the timeout to 90 minutes, as well as trying to improve the handling and behaviour when it hits such a timeout to avoid misleading exceptions being logged (it will now fail and not continue if it hits the timeout so should fix your original issue).

It probably needs more work to avoid proceeding to next steps if there are errors during the copy though, this doesn't seem ideal.

xaidu commented 1 year ago

What timeout did you set when you adjusted it? Was it 3000 seconds, or something larger?

Yeah， I changed it to 100000 seconds. And after I use a new backup DB, It can migrate successfully. I think it would be better if it can be a config set by the user. And Many thanks @chadlwilson

chadlwilson commented 1 year ago

That's good to hear - did you use the steps in https://gist.github.com/arvindsv/c7934bee26ce1c47e9d24f0a493d3c87 to do the backup/restore, or some other technique?

chadlwilson commented 1 year ago

--export-timeout made configurable in https://github.com/gocd/gocd-database-migrator/commit/7ce3921425af6b9d3045021b7540136eae6be09f, released in https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.4-145

gocd / gocd-database-migrator

Always can not migrate the full table MODIFIEDFILES #76