AnatolyUss / nmig

NMIG is a database migration tool, written in Node.js and highly inspired by FromMySqlToPostgreSql.
GNU General Public License v3.0
468 stars 84 forks source link

Data Loading Stuck #129

Open perrornet opened 3 months ago

perrornet commented 3 months ago

I noticed yesterday that the logs show --[NMIG DataWriter] Loading the data into "public"."my_table" table..., but it has remained at this message until now. I'm unsure what is happening.

System Configuration

Configuration Information

{
    "source": {
        "host": "mysql",
        "port": 3306,
        "database": "test",
        "charset": "utf8mb4",
        "supportBigNumbers": true,
        "user": "xxx",
        "password": "xxxx"
    },
    "target": {
        "host": "postgres",
        "port": 5432,
        "database": "test",
        "charset": "UTF8",
        "user": "xxxx",
        "password": "xxxx",
        "ssl": {
            "require": true,
            "rejectUnauthorized": false
        }
    },
    "max_each_db_connection_pool_size": 100,
    "number_of_simultaneously_running_loader_processes": "8",
    "loader_max_old_space_size": "16384",
    "streams_high_water_mark": 16384,
    "encoding": "utf8",
    "schema": "public",
    "migrate_only_data": false,
    "delimiter": ",",
    "enable_extra_config": false
}

Logs

    NMIG - the database migration tool
    Copyright (C) 2016 - present, Anatoly Khaytovich <anatolyuss@gmail.com>

    --[boot] Configuration has been just loaded.
    --[boot] NMIG is restarting after some failure.
    --[boot] Consider checking log files at the end of migration.

    --[MigrationStateManager::createStateLogsTable] table "public"."state_logs_publicsubtest" is created...
    --[DataPoolManager::createDataPoolTable] table "public"."data_pool_publicsubtest" is created...
    --[TableProcessor::createTable] Currently creating table: `my_table`
    --[StructureLoader::default] Source DB structure is loaded...

        --[StructureLoader::default] Tables to migrate: 1
        --[StructureLoader::default] Views to migrate: 0
    --[TableProcessor::createTable] Table "public"."my_table" is created...
    --[DataPoolManager::readDataPool] Data-Pool is loaded...

    --[NMIG data transfer] "public"."my_table" DATA TRANSFER IN PROGRESS...
    --[NMIG data transfer] TIME REQUIRED FOR TRANSFER DEPENDS ON AMOUNT OF DATA...

    --[NMIG loadData] Loading the data into "public"."my_table" table...
    --[NMIG DataWriter] Loading the data into "public"."my_table" table...

The my_table contains a total of 202,488,638 rows, with a data size of approximately 396.05 GB.

When I executed SHOW PROCESSLIST in MySQL, I found no active processes. In Postgres, querying SELECT * FROM pg_stat_activity WHERE state NOT ILIKE 'idle%' showed only one SQL statement running: COPY "public"."chain_extrinsics" (xxxxx,xxx,xxxx,xxx,xxxx,xx) FROM STDIN WITH (FORMAT csv, DELIMITER ',', ENCODING 'UTF8');

Here is the process information from the pod:

  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    1209m   2%   6   0% /app/app -c /conf/config.json
   73    62 root     S     735m   1%   8   0% /usr/local/bin/node /app/nmig/dist/src/DataWriter.js --max-old-space-size=16384
   62    30 root     S     731m   1%  10   0% /usr/local/bin/node --max-old-space-size=16384 /app/nmig/dist/src/DataReader.js
   30    16 root     S     684m   1%  11   0% node dist/src/Main.js --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs
   16     1 root     S     681m   1%   8   0% npm start --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs
   51    30 root     S     677m   1%  12   0% /usr/local/bin/node /app/nmig/dist/src/LogsProcessor.js
AnatolyUss commented 3 months ago

Hello, It seems that the data stream (MySQL -> nmig - > PG COPY) is still running. "my_table" is quite large )) Any chance you can give it more time to see when/if it will complete?

On Wed, Jul 31, 2024 at 4:16 PM perror @.***> wrote:

I noticed yesterday that the logs show --[NMIG DataWriter] Loading the data into "public"."my_table" table..., but it has remained at this message until now. I'm unsure what is happening. System Configuration

  • Running in K8S without any CPU or memory limits set.Running in K8S without any CPU or memory limits set

Configuration Information

{ "source": { "host": "mysql", "port": 3306, "database": "test", "charset": "utf8mb4", "supportBigNumbers": true, "user": "xxx", "password": "xxxx" }, "target": { "host": "postgres", "port": 5432, "database": "test", "charset": "UTF8", "user": "xxxx", "password": "xxxx", "ssl": { "require": true, "rejectUnauthorized": false } }, "max_each_db_connection_pool_size": 100, "number_of_simultaneously_running_loader_processes": "8", "loader_max_old_space_size": "16384", "streams_high_water_mark": 16384, "encoding": "utf8", "schema": "public", "migrate_only_data": false, "delimiter": ",", "enable_extra_config": false }

Logs

NMIG - the database migration tool
Copyright (C) 2016 - present, Anatoly Khaytovich ***@***.***>

--[boot] Configuration has been just loaded.
--[boot] NMIG is restarting after some failure.
--[boot] Consider checking log files at the end of migration.

--[MigrationStateManager::createStateLogsTable] table "public"."state_logs_publicsubtest" is created...
--[DataPoolManager::createDataPoolTable] table "public"."data_pool_publicsubtest" is created...
--[TableProcessor::createTable] Currently creating table: `my_table`
--[StructureLoader::default] Source DB structure is loaded...

    --[StructureLoader::default] Tables to migrate: 1
    --[StructureLoader::default] Views to migrate: 0
--[TableProcessor::createTable] Table "public"."my_table" is created...
--[DataPoolManager::readDataPool] Data-Pool is loaded...

--[NMIG data transfer] "public"."my_table" DATA TRANSFER IN PROGRESS...
--[NMIG data transfer] TIME REQUIRED FOR TRANSFER DEPENDS ON AMOUNT OF DATA...

--[NMIG loadData] Loading the data into "public"."my_table" table...
--[NMIG DataWriter] Loading the data into "public"."my_table" table...

The my_table contains a total of 202,488,638 rows, with a data size of approximately 396.05 GB.

When I executed SHOW PROCESSLIST in MySQL, I found no active processes. In Postgres, querying SELECT * FROM pg_stat_activity WHERE state NOT ILIKE 'idle%' showed only one SQL statement running: COPY "public"."chain_extrinsics" (xxxxx,xxx,xxxx,xxx,xxxx,xx) FROM STDIN WITH (FORMAT csv, DELIMITER ',', ENCODING 'UTF8');

Here is the process information from the pod:

PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 1 0 root S 1209m 2% 6 0% /app/app -c /conf/config.json 73 62 root S 735m 1% 8 0% /usr/local/bin/node /app/nmig/dist/src/DataWriter.js --max-old-space-size=16384 62 30 root S 731m 1% 10 0% /usr/local/bin/node --max-old-space-size=16384 /app/nmig/dist/src/DataReader.js 30 16 root S 684m 1% 11 0% node dist/src/Main.js --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs 16 1 root S 681m 1% 8 0% npm start --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs 51 30 root S 677m 1% 12 0% /usr/local/bin/node /app/nmig/dist/src/LogsProcessor.js

— Reply to this email directly, view it on GitHub https://github.com/AnatolyUss/nmig/issues/129, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHR7N6DISCKFCT3AQVKZDDZPDPRPAVCNFSM6AAAAABLYO2VSSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DAMBTG44TSNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

perrornet commented 3 months ago

Today I deleted the remaining files and database tables left after the last run, but when running again, it still got stuck at the same place.

  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    1208m   2%  15   0% /app/app -c /conf/config.json
   69    28 root     S     746m   1%  11   0% /usr/local/bin/node --max-old-space-size=16384 /app/nmig/dist/src/DataReader.js
   80    69 root     S     746m   1%   7   0% /usr/local/bin/node /app/nmig/dist/src/DataWriter.js --max-old-space-size=16384
   28    17 root     S     684m   1%  15   0% node dist/src/Main.js --conf-dir=/data/migrate/test/config --logs-dir=/data/migrate/test
   17     1 root     S     682m   1%   9   0% npm start --conf-dir=/data/migrate/test/config --logs-dir=/data/migrate/test/logs
   49    28 root     S     677m   1%  12   0% /usr/local/bin/node /app/nmig/dist/src/LogsProcessor.js
image

CPU usage:

image

Memory usage:

image

Network:

image

It seems that there is no more any activity after running for about 40 minutes.

perrornet commented 3 months ago

I added some logs and found that it was indeed stuck:

--[NMIG loadData] 8/2/2024, 7:18:52 AM: my_table  total count: 1924946/111546685, totalSize: 80.87231482099742 GB
--[NMIG DataWriter] 8/2/2024, 7:18:55 AM: my_table  total count: 1725577/111546685, totalSize: 80.98360828589648 GB
--[NMIG loadData] 8/2/2024, 7:18:57 AM: my_table  total count: 1925282/111546685, totalSize: 81.02011202834547 GB
--[NMIG DataWriter] 8/2/2024, 7:19:00 AM: my_table  total count: 1728086/111546685, totalSize: 81.13046906515956 GB
--[NMIG loadData] 8/2/2024, 7:19:02 AM: my_table  total count: 1925613/111546685, totalSize: 81.16610423196107 GB
--[NMIG DataWriter] 8/2/2024, 7:19:06 AM: my_table  total count: 1730421/111546685, totalSize: 81.2668507695198 GB
--[NMIG loadData] 8/2/2024, 7:19:07 AM: my_table  total count: 1926112/111546685, totalSize: 81.29773158021271 GB
--[NMIG DataWriter] 8/2/2024, 7:19:11 AM: my_table  total count: 1732649/111546685, totalSize: 81.3966732956469 GB
--[NMIG loadData] 8/2/2024, 7:19:12 AM: my_table  total count: 1927716/111546685, totalSize: 81.4275922216475 GB
--[NMIG DataWriter] 8/2/2024, 7:19:16 AM: my_table  total count: 1735124/111546685, totalSize: 81.53275842126459 GB
--[NMIG loadData] 8/2/2024, 7:19:17 AM: my_table  total count: 1934095/111546685, totalSize: 81.56192161794752 GB
--[NMIG DataWriter] 8/2/2024, 7:19:21 AM: my_table  total count: 1737575/111546685, totalSize: 81.66759182140231 GB
--[NMIG loadData] 8/2/2024, 7:19:22 AM: my_table  total count: 1955326/111546685, totalSize: 81.7029321147129 GB
AnatolyUss commented 3 months ago

Any chance, that the "my_table" contains any kind of BLOB columns?

WayneEld commented 3 months ago

@perrornet I had the same issue. It seems like there was a partial migration that happened and was subsequently stopped. I deleted the database and schema and re ran the script and it worked again. (I did a whole investigation into ignoring tables, increasing pool connection but nothing work).

perrornet commented 3 months ago

@perrornet I had the same issue. It seems like there was a partial migration that happened and was subsequently stopped. I deleted the database and schema and re ran the script and it worked again. (I did a whole investigation into ignoring tables, increasing pool connection but nothing work).

This solution does not work for me. I have modified the nmig code and no longer use the 'copy' command to import data. Instead, I am using bulk insert SQL. The migration has now been completed successfully.

perrornet commented 3 months ago

Any chance, that the "my_table" contains any kind of BLOB columns?

Yes, one of the fields in "my_table" is of type longblob.

manakov-dev commented 2 months ago

@perrornet Could you please share your version with better logging and inserts? I faced the same issue - if the target db is not empty, then data migration stuck in ~50% of runs