Closed haubourg closed 9 months ago
When killing pgcopydb, the logs suddenly gets filled with old messages indicating i/o error. (sorry, I lost the logs. I'll launch another run to have them)
Unfortunately from the rest of the message it looks like we need these logs to better understand what's happening. The only other idea I would have at the moment is an overloaded system. Can you check for CPU/RAM/Disk saturation from either the source database system, the pgcopydb environment, or the target database system, while the migration is happening? Maybe also lower your jobs parameters in pgcopydb to adjust?
Allright, I will launch another migration test and will be back within 24h with real log. Iterations are a bit long :)
As for the OS saturation, below are some grafana graphs of my successive attempts. My config is a test server with 16 cpu, 32 Go of RAM, and HDD not much else happening except the cadvisor service for grafana itself.
I hit a lot of iowait as soon a index creations starts. But you might be right, I hit 80% write speed on my disk , but I see 100% load between migration attempts
I'm suspicious against my soft RAID1 behavior but it might just be that I ask too much concurrency. My first trials were failing some table copy with shared_memory
error. I first raised a lot max_connections and max_locks_per_transaction with no success. In the end lowering --table-jobs
avoided these issues, but I might be too high . I now use --table-jobs 5 --index-jobs 16
. Should I lower those parameters?
The real migration will happen on a 48 CPU / 180 Go RAM with 1To Nvme disks holding the database and ~ and a tablespaces on slow HDD (4 To).
I am not sure what parallelism settings I should use during the real migration. Any rule of thumb would be welcome.
Rule of thumb for getting started:
--table-jobs
is how many concurrent processes your SOURCE system can deal with, mostly IO bound (disk reads), sometime CPU bound (converting on-disk binary data to text format may use more CPU than anticipated depending on the data type)--index-jobs
is how many CPU you can dedicate on the TARGET system to CREATE INDEXes, but if you're limited by IOPS bandwidth then this needs taken into account too.Si I ran again the same migration and it seems related to #496
The stdout is stuck on this error (at 20:32 server time) . All the workers stay idle doing nothing. I SIGTERM the workers, and then they spit out their logs into the log file .
023-10-16 20:32:43 3822248 ERROR pgsql.c:2487 [TARGET 66111] [42P10] ERROR: column "test" is a generated column
2023-10-16 20:32:43 3822248 ERROR pgsql.c:2495 [TARGET 66111] DETAIL: Generated columns cannot be used in COPY.
2023-10-16 20:32:43 3822248 ERROR pgsql.c:2502 [TARGET 66111] Context: copy "z_tmp_rhaubourg"."ref_admin_commune"("id", "geom", "fid", "nom", "nom_m", "insee_com", "statut", "population", "insee_canton", "insee_arrond", "insee_dep", "insee_reg", "siren_epci", "nom_generated", "test") from stdin
2023-10-16 20:32:43 3822248 ERROR table-data.c:450 Failed to copy data for table with oid 36586396 and part number 0, see above for details
[ here the log was stuck before sending SIGTERM]
upe_ffo_bat_batiment_groupe_id_pk"
2023-10-16 19:16:47 3822258 NOTICE indexes.c:772 CREATE UNIQUE INDEX batiment_groupe_rnc_batiment_groupe_id_pk ON tables_bdnb_v1_2022_10d_2.batiment_groupe_rnc USING btree (batiment_groupe_id);
2023-10-16 19:16:47 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/36614719.done"
2023-10-16 19:16:47 3822258 NOTICE indexes.c:772 CREATE INDEX "8d46f0be081184f13abfb05509ffc811" ON tables_bdnb_v1_2022_10d_2.batiment_groupe_simulations_dvf USING btree (batiment_groupe_id);
2023-10-16 19:16:47 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/36614696.done"
2023-10-16 19:16:49 3822258 NOTICE indexes.c:772 CREATE INDEX ix_z_tmp_cbacon_superposition_tup_index ON z_tmp_cbacon.superposition_tup USING btree (index);
2023-10-16 19:16:49 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/36589845.done"
2023-10-16 19:16:49 3822258 NOTICE indexes.c:1156 Found 1 indexes on target database for table "z_tmp_cbacon"."superposition_tup"
2023-10-16 19:16:49 3822258 NOTICE indexes.c:772 CREATE INDEX idx_geom_groupe ON z_tmp_cbacon.test_dvf_open USING gist (geom_groupe);
2023-10-16 19:16:49 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/36593662.done"
2023-10-16 19:16:50 3822258 NOTICE indexes.c:772 CREATE INDEX recette_pies_v07_dev_geom_pies_idx ON z_tmp_mhamdoune.recette_pies_v07_dev USING gist (geom_pies);
2023-10-16 19:16:50 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/36604040.done"
2023-10-16 19:16:50 3822258 NOTICE indexes.c:1156 Found 2 indexes on target database for table "z_tmp_mhamdoune"."recette_pies_v07_dev"
2023-10-16 19:16:50 3822258 NOTICE indexes.c:772 CREATE INDEX differentiel_iris_v07_v08_code_iris_idx ON z_tmp_mthorel.differentiel_iris_v07_v08 USING btree (code_iris);
2023-10-16 19:16:50 3822258 NOTICE indexes.c:944 Creating summary file "/tmp/pgcopydb/run/indexes/3
[...]
2023-10-17 13:06:41 3822263 ERROR indexes.c:98 CREATE INDEX worker has been interrupted
I can workaround by removing generated columns from source by now and will test again
Meanwhile support for generated columns have been added to pgcopydb in https://github.com/dimitri/pgcopydb/pull/517, and the PR is part of release https://github.com/dimitri/pgcopydb/releases/tag/v0.14. Closing this issue. Please consider opening again with more information if needed, or opening a new issue. Also please try with current “main” branch before opening an issue, the --fail-fast
option keeps getting improved to handle many situations where pgcopydb would just stall otherwise (error handling can be difficult to implement right sometimes).
Hi again, still testing a migration here with pgcopydb v0.13-1 (from pgdg repo) on ubuntu.
All my attempts never ended, pgcopydb never exits properly. Nothings happens on the processes, and nothing appears in pg_stata_activity.
When killing pgcopydb, the logs suddenly gets filled with old messages indicating i/o error. (sorry, I lost the logs. I'll launch another run to have them)
Doing a
pgcopydb compare schema
afterwards show ~ 10 missing indices.Running a
pgcopydb restore --post-data
raises stale pids :I'm migrating from PG13 to PG15. Target PG15 is aggressively tuned for writing speed with autovaccuum off, fsync off, but I have the same behavior for all my configuration trials.
In journactl, I found a few of these warnings (xxx replaces real values)=:
Any idea of what could be wrong in my setup ?