Closed villain closed 8 years ago
@villain
Can you download this migrate-data.pl script and run it with the debug flag (-d)?
$ wget https://raw.githubusercontent.com/giovino/massive-octo-spice/develop/v1migration/bin/migrate-data.pl
You can see the changes here.
The changes add some verbose logging (print statement) and attempts to catch some encoding and decoding errors.
It's not clear to me these errors are related to the initial error we are trying to debug. Is it possible the system was left in a poor state? Does restarting and then trying the migrate script again produce different results?
^ yeah, that was from a box i've been troubleshooting on, just re-running the query on the proper host now
hopefully this looks like what youre expecting:
[2016-04-29T11:52:55,113Z][1821][INFO]: staring up.. [2016-04-29T11:52:55,115Z][1821][INFO]: starting up ES connection... [2016-04-29T11:52:55,116Z][1821][INFO]: checking journal: /tmp/cif-migrate.journal [2016-04-29T11:52:55,135Z][1821][INFO]: creating threads... [2016-04-29T11:52:55,823Z][1821][INFO]: starting workers
[2016-04-29T13:16:05,729Z][1821][INFO]: starting writer thread... Subroutine CIF::Legacy::Archive::db_Main redefined at /usr/local/share/perl/5.18.2/Ima/DBI.pm line 278. $VAR1 = { 'id' => 4544, 'uuid' => '1b9dc4b0-0bec-4a98-b091-ecb4decfbcf8', 'data' => '0ALwVg0AAIA/Er4CCjUKC2Nyb3dkc3RyaWtlGAMiJDFiOWRjNGIwLTBiZWMtNGE5OC1iMDkxLWVj YjRkZWNmYmNmOCIUMjAxMy0wNy0yMlQwNTozNDozNlo6FE4WALhCOQoCRU4SM3Bvc3NpYmx5IG1h bGljaW91cyBkeW5hbWljIGRucyBkb21haW4gKB2OOClKIgoXCA0aAkVOKAMyDQVHTAdtYWx3YXJl KgcIBBUAAL5CWhgKDRrwe3Vua25vd25CA1VUQ1gDYAViIEIeChwKGBIWCAQSBGZxZG4qDHNxdWly bHkuaW5mb1gCcjkSBHV1aWQaCWd1aWQgaGFzaCARSiQ4Yzg2NDMwNi1kMjFhLTM3YjEtODcwNS03 NDZhNzg2NzE5YmZ4ApABAxoEMC4wMSICRU4= ', 'guid' => '8c864306-d21a-37b1-8705-746a786719bf' }; hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at bin/migrate-data-debug.pl line 286.
Segmentation fault (core dumped)
@villain
We're getting closer, try this version:
wget https://raw.githubusercontent.com/giovino/massive-octo-spice/develop/v1migration/bin/migrate-data.pl
All changes can be seen here
seems like its back to the original output?
[2016-04-30T12:08:42,339Z][5735][INFO]: staring up.. [2016-04-30T12:08:42,340Z][5735][INFO]: starting up ES connection... [2016-04-30T12:08:42,340Z][5735][INFO]: checking journal: /tmp/cif-migrate.journal [2016-04-30T12:08:42,341Z][5735][INFO]: creating threads... [2016-04-30T12:08:42,588Z][5735][INFO]: starting workers [2016-04-30T13:28:07,014Z][5735][INFO]: starting writer thread... Subroutine CIF::Legacy::Archive::db_Main redefined at /usr/local/share/perl/5.18.2/Ima/DBI.pm line 278. hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at bin/migrate-data-debug.pl line 286.
Segmentation fault (core dumped)
@villain
Is it possible the root cause of the segmentation fault is the host is running out of memory or do you know that it is a data structure parsing error?
q1: How much memory does this host running this script have? q2: Have you monitored the memory prior to the seg fault?
@villain
and are you running this with the debug flag? (-d)
not running out of memory from what i can tell. it has 32GB available, never drops below 4GB
re-run of the debug output:
[2016-05-05T09:25:35,527Z][23928][INFO][main:136]: staring up.. [2016-05-05T09:25:35,527Z][23928][INFO][main:149]: starting up ES connection... [2016-05-05T09:25:35,527Z][23928][INFO][main:156]: checking journal: /tmp/cif-migrate.journal [2016-05-05T09:25:35,528Z][23928][INFO][main:159]: creating threads... [2016-05-05T09:25:35,735Z][23928][INFO][main:214]: starting workers [2016-05-05T09:25:35,736Z][23928][DEBUG][main:224]: connecting to archive..
[2016-05-05T14:12:18,398Z][23928][DEBUG][main:253]: total count: 290774165 [2016-05-05T14:12:18,398Z][23928][DEBUG][main:254]: pages: 58155 [2016-05-05T14:12:18,509Z][23928][DEBUG][main:261]: sending ctrl warm-up msg... [2016-05-05T14:12:18,723Z][23928][DEBUG][main:267]: creating 8 worker threads... [2016-05-05T14:12:18,723Z][23928][INFO][main:333]: starting writer thread... Subroutine CIF::Legacy::Archive::db_Main redefined at /usr/local/share/perl/5.18.2/Ima/DBI.pm line 278. [2016-05-05T14:12:18,875Z][23928][DEBUG][main:401]: starting worker: 3 [2016-05-05T14:12:19,011Z][23928][DEBUG][main:401]: starting worker: 4 [2016-05-05T14:12:19,120Z][23928][DEBUG][main:401]: starting worker: 5 [2016-05-05T14:12:19,269Z][23928][DEBUG][main:401]: starting worker: 6 [2016-05-05T14:12:19,420Z][23928][DEBUG][main:401]: starting worker: 7 [2016-05-05T14:12:19,585Z][23928][DEBUG][main:401]: starting worker: 8 [2016-05-05T14:12:19,723Z][23928][DEBUG][main:401]: starting worker: 9 [2016-05-05T14:12:19,832Z][23928][DEBUG][main:401]: starting worker: 10 [2016-05-05T14:12:19,832Z][23928][DEBUG][main:274]: executing sql... [2016-05-05T14:12:20,341Z][23928][DEBUG][main:280]: sending next pages to workers... hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at bin/migrate-data-debug.pl line 286.
Segmentation fault (core dumped)
villian, I apologize for the tardiness in this response, I thought I had responded already. Our guess is, you are hitting a known memory leak with a total count of 290,774,165 records. We've seen segfaults ourselves in a similarly large migration.
As it stands today, because the migration script uses a journal (e.g. it knows what has and hasn't been migrated) our recommendation is to:
cant re-open the previous issue, creating a new one as suggested;
yep, still having the problem. just did another git pull, getting the same error. i'm migrating from a v1 instance
[2016-04-26T08:42:36,147Z][12427][INFO]: staring up.. [2016-04-26T08:42:36,148Z][12427][INFO]: starting up ES connection... [2016-04-26T08:42:36,149Z][12427][INFO]: checking journal: /tmp/cif-migrate.journal [2016-04-26T08:42:36,149Z][12427][INFO]: creating threads... [2016-04-26T08:42:36,438Z][12427][INFO]: starting workers [2016-04-26T10:14:57,394Z][12427][INFO]: starting writer thread... Subroutine CIF::Legacy::Archive::db_Main redefined at /usr/local/share/perl/5.18.2/Ima/DBI.pm line 278. hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at bin/migrate-data.pl line 282.
Segmentation fault (core dumped)
it was working ok until the more recent changes to the migrate script