Closed xaidu closed 1 year ago
Which of those is the correct number, from your perspective?
Please share the fuller output, version you are using, postgresql version etc.
May I confirm what happened when it did not finish the copy, but the time is already over 1000 SECONDS.
I saw the method doExport
will do termination after wait the 1000 seconds.
private void doExport(BasicDataSource sourceDataSource, DataSource targetDataSource, Writer writer) {
Map<String, Integer> tables = listTables(sourceDataSource);
LOG.info("Found tables:");
tables.forEach((tableName, recordCount) -> {
LOG.info(" {}: {} records", tableName, recordCount);
});
try (ProgressBar progressBar = progressBar(tables)) {
ThreadPoolExecutor executor = new ThreadPoolExecutor(args.threads, args.threads, 1L, TimeUnit.SECONDS, new LinkedQueue<>(2));
try {
tables.forEach((String tableName, Integer rowCount) -> {
executor.execute(() -> {
try (Connection sourceConnection = sourceDataSource.getConnection()) {
dumpTableSQL(tableName, rowCount, sourceConnection, targetDataSource, writer, progressBar);
} catch (Exception e) {
LOG.error(null, e);
throw new RuntimeException(e);
}
});
});
} finally {
LOG.debug("Shutting down thread pool executor");
executor.shutdown();
LOG.debug("Awaiting termination of executorService");
try {
executor.awaitTermination(1000, TimeUnit.SECONDS);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
PostgreSQL version 12. the following out put in the console log.
2023-07-14 18:57:30,196 INFO [main] DbSync:194 - MATERIALS: 2683 records
record copy progress 45% [ ] 23455678/66789021 record (0:26:20 / 0:48:42) 1549
2023-07-14 19:23:51,324 INFO [main] DbSync:123 - Done copying database records.
2023-07-14 19:23:51,324 INFO [main] DbSync:128 - Setting sequences for all tables.
2023-07-14 19:23:51,480 INFO [main] DbSync:130 - Done setting sequences for all tables.
2023-07-14 19:23:51,480 INFO [main] DbSync:134 - Initializing database indices and constraints on target database. This may take several minutes, depending on the size of the database.
2023-07-14 19:31:32,044 INFO [main] DbSync:136 - Done initializing database indices and constraints on target database.
2023-07-14 19:31:32,044 INFO [main] DbSync:145 - Verifying if number of records are identical in source and target.
2023-07-14 19:31:34,704 ERROR [main] DbSync:158 - It appears that there was a problem copying records:
2023-07-14 19:31:34,704 ERROR [main] DbSync:160 - Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records
I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.
batch-size
(bigger? smaller?) or the threads
(fewer? more?)I believe if the migrator was hitting that timeout it would fail with an exception and be logged, but I don't see that above.
Can you try a bigger DB which should use more than 1 hour to do the migration? I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.
Reading change scripts from directory
2023-07-17 13:54:33,926 INFO [main] DbSync:108 - Initializing database skeleton on target database.
2023-07-17 13:54:36,645 INFO [main] DbSync:110 - Done initializing database skeleton on target database.
2023-07-17 13:54:36,645 INFO [main] DbSync:114 - Initializing database views.
2023-07-17 13:54:37,473 INFO [main] DbSync:116 - Done initializing database views.
2023-07-17 13:54:37,473 INFO [main] DbSync:121 - Copying database records.
2023-07-17 13:54:37,489 INFO [main] DbSync:191 - Found tables:
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - MODIFIEDFILES: 54714941 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - ACCESSTOKEN: 43 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - DATASHARINGSETTINGS: 1 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PIPELINESTATES: 2329 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PIPELINELABELCOUNTS: 2569 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - USAGEDATAREPORTING: 1 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - ARTIFACTPLANS: 465613 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - STAGES: 449482 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - SERVERBACKUPS: 1076 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - VERSIONINFOS: 1 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - RESOURCES: 82738 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - NOTIFICATIONFILTERS: 9 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - BUILDS: 1966459 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PIPELINES: 282969 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PIPELINEMATERIALREVISIONS: 565836 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - MODIFICATIONS: 115150 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PIPELINESELECTIONS: 148 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - PLUGINS: 0 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - JOBAGENTMETADATA: 0 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - BUILDSTATETRANSITIONS: 10728317 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - ENVIRONMENTVARIABLES: 258548 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - AGENTS: 82638 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - STAGEARTIFACTCLEANUPPROHIBITED: 0 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - USERS: 232 records
2023-07-17 13:54:37,489 INFO [main] DbSync:194 - MATERIALS: 2683 records
record copy progress 97% [ ] 67906842/69721783 record (0:49:50 / 0:01:19) 22712023-07-17 14:44:27,782 ERROR [pool-3-thread-1] DbSync:205 - null
org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
at org.jooq_3.16.14.H2.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:3104)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:670)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:354)
at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:284)
at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2843)
at com.thoughtworks.go.dbsync.DbSync.dumpTableSQL(DbSync.java:343)
at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:203)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
at org.h2.message.DbException.get(DbException.java:205)
at org.h2.message.DbException.get(DbException.java:181)
at org.h2.pagestore.db.PageDataIndex.getPage(PageDataIndex.java:206)
at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:232)
at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:230)
at org.h2.pagestore.db.PageDataLeaf.getNextPage(PageDataLeaf.java:397)
at org.h2.pagestore.db.PageDataCursor.nextRow(PageDataCursor.java:61)
at org.h2.pagestore.db.PageDataCursor.next(PageDataCursor.java:41)
at org.h2.index.IndexCursor.next(IndexCursor.java:291)
at org.h2.table.TableFilter.next(TableFilter.java:529)
at org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1843)
at org.h2.result.LazyResult.hasNext(LazyResult.java:101)
at org.h2.result.LazyResult.next(LazyResult.java:60)
at org.h2.command.dml.Select.queryFlat(Select.java:737)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:844)
at org.h2.command.dml.Query.queryWithoutCacheLazyCheck(Query.java:201)
at org.h2.command.dml.Query.query(Query.java:489)
at org.h2.command.dml.Query.query(Query.java:451)
at org.h2.command.CommandContainer.query(CommandContainer.java:285)
at org.h2.command.Command.executeQuery(Command.java:195)
at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:233)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219)
at org.jooq.impl.Tools.executeStatementAndGetFirstResultSet(Tools.java:4382)
at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:230)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:340)
... 7 common frames omitted
record copy progress 97% [ ] 67906842/69721783 record (0:49:50 / 0:01:19) 2271
java.lang.RuntimeException: org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:206)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.jooq.exception.DataAccessException: SQL [select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ?]; File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
at org.jooq_3.16.14.H2.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:3104)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:670)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:354)
at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:284)
at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2843)
at com.thoughtworks.go.dbsync.DbSync.dumpTableSQL(DbSync.java:343)
at com.thoughtworks.go.dbsync.DbSync.lambda$doExport$13(DbSync.java:203)
... 3 more
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
select * from MODIFIEDFILES where id > cast(? as bigint) order by id limit ? [90030-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
at org.h2.message.DbException.get(DbException.java:205)
at org.h2.message.DbException.get(DbException.java:181)
at org.h2.pagestore.db.PageDataIndex.getPage(PageDataIndex.java:206)
at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:232)
at org.h2.pagestore.db.PageDataNode.getNextPage(PageDataNode.java:230)
at org.h2.pagestore.db.PageDataLeaf.getNextPage(PageDataLeaf.java:397)
at org.h2.pagestore.db.PageDataCursor.nextRow(PageDataCursor.java:61)
at org.h2.pagestore.db.PageDataCursor.next(PageDataCursor.java:41)
at org.h2.index.IndexCursor.next(IndexCursor.java:291)
at org.h2.table.TableFilter.next(TableFilter.java:529)
at org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1843)
at org.h2.result.LazyResult.hasNext(LazyResult.java:101)
at org.h2.result.LazyResult.next(LazyResult.java:60)
at org.h2.command.dml.Select.queryFlat(Select.java:737)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:844)
at org.h2.command.dml.Query.queryWithoutCacheLazyCheck(Query.java:201)
at org.h2.command.dml.Query.query(Query.java:489)
at org.h2.command.dml.Query.query(Query.java:451)
at org.h2.command.CommandContainer.query(CommandContainer.java:285)
at org.h2.command.Command.executeQuery(Command.java:195)
at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:233)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219)
at org.jooq.impl.Tools.executeStatementAndGetFirstResultSet(Tools.java:4382)
at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:230)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:340)
... 7 more
2023-07-17 14:44:27,797 INFO [main] DbSync:123 - Done copying database records.
2023-07-17 14:44:27,828 INFO [main] DbSync:128 - Setting sequences for all tables.
2023-07-17 14:44:28,047 INFO [main] DbSync:130 - Done setting sequences for all tables.
2023-07-17 14:44:28,063 INFO [main] DbSync:134 - Initializing database indices and constraints on target database. This may take several minutes, depending on the size of the database.
2023-07-17 14:54:07,774 INFO [main] DbSync:136 - Done initializing database indices and constraints on target database.
2023-07-17 14:54:07,774 INFO [main] DbSync:145 - Verifying if number of records are identical in source and target.
2023-07-17 14:54:13,336 ERROR [main] DbSync:158 - It appears that there was a problem copying records:
2023-07-17 14:54:13,336 ERROR [main] DbSync:160 - Expected table MODIFIEDFILES to contain 54714941 records but contained 52900000 records
as I asked earlier, please share the specific migrator version you are using
which Java version are you running with?
Java 17
what command line args are you running with (can exclude credentials)
./bin/gocd-database-migrator --insert --progress --source-db-url="jdbc:h2: lib\db\h2db\cruise" --source-db-user='' --source-db-password='' --target-db-url='jdbc:postgresql://localhost:5432/gocd' --target-db-user='username' --target-db-password='password'
is the result and # of rows exactly the same if you change the batch-size (bigger? smaller?) or the threads (fewer? more?)
NO, the batch size and threads do not help.
OK, that seems odd re: the timeout getting hit and swallowing the error, but perhaps you are right - I'm not really sure why though. I guess that'd be a separate issue/bug to your core problem:
File corrupted while reading record: "[8032620] stream data key:83103 pos:11 remaining:0". Possible solution: use the recovery tool; SQL statement:
Anyway it looks like your source H2 DB is corrupted in some way within the MODIFIEDFILES
table. That error is coming from the H2 DB driver.
You are probably going to have to try and use H2 database tools to recover the database and then retry the migration on a recovered database.
20.4.0
(your corrupted source database) is based on H2 1.3.168
, so I would advise you to use https://repo1.maven.org/maven2/com/h2database/h2/1.3.168/ (h2-1.3.168.jar
) for your recovery attempt NOT the 1.4.200
driver that is included with the migrator tool.Be careful to keep your original H2DB source file backed up and untouched. There should be no need to replace it, the "recovered" DB file can be used just for the migration to GoCD 20.5.0
.
I tried locally, but it seems it is not finished without any exceptions. After that, I increase the time out value, then I got the error and I put the full logs following.
What timeout did you set when you adjusted it? Was it 3000
seconds, or something larger?
Another thought, in case the corruption is actually misleading, and it's caused by an error during timeout (the export getting interrupted), maybe we can try a really large number and see what happens?
I've published https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.3-140 which increases the timeout to 90 minutes, as well as trying to improve the handling and behaviour when it hits such a timeout to avoid misleading exceptions being logged (it will now fail and not continue if it hits the timeout so should fix your original issue).
It probably needs more work to avoid proceeding to next steps if there are errors during the copy though, this doesn't seem ideal.
What timeout did you set when you adjusted it? Was it 3000 seconds, or something larger?
Yeah, I changed it to 100000 seconds. And after I use a new backup DB, It can migrate successfully. I think it would be better if it can be a config set by the user. And Many thanks @chadlwilson
That's good to hear - did you use the steps in https://gist.github.com/arvindsv/c7934bee26ce1c47e9d24f0a493d3c87 to do the backup/restore, or some other technique?
--export-timeout
made configurable in https://github.com/gocd/gocd-database-migrator/commit/7ce3921425af6b9d3045021b7540136eae6be09f, released in https://github.com/gocd/gocd-database-migrator/releases/tag/1.0.4-145
GOCD Server : 20.4
Try migrating the DB H2 => Postgresql it always says Expected table MODIFIEDFILES to contain 56778899 records but contained 25647746 records