restore never end - Githubissues

FeiYangtze commented 2 years ago

I have a database KDB, about 14g data. I haven't finished using restore for 14 hours. I feel it will never end. Please give me some advice

Both local backup and S3 backup have been tried. The situation is the same

Jira issue: CRDB-1808

This time, the userfile method was used for backup and recovery, and the recovery was not completed after more than 60 hours

blathers-crl[bot] commented 2 years ago

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

@cockroachdb/bulk-io (found keywords: backup,restore)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/bulk-io

adityamaru commented 2 years ago

Hi @FeiYangtze, thanks for opening an issue!

Can you run SHOW JOBS and see if the restore progress is stuck at a particular number or is progressing but slowly?

Additionally, I'd need:

Cluster configuration
Cockroach version
Debug zip during the stuck restore - https://www.cockroachlabs.com/docs/stable/cockroach-debug-zip.html

To debug this further. If you'd prefer not to share the zip in public you can send it over https://support.cockroachlabs.com/hc/en-us or hop in to our community slack and message it to me.

FeiYangtze commented 2 years ago

@adityamaru Thanks for your reply!

Cockroach version [root@localhost ~]# cockroach version Build Tag: v21.1.10 Build Time: 2021/10/07 02:39:03 Distribution: CCL Platform: linux amd64 (x86_64-unknown-linux-gnu) Go Version: go1.15.14 C Compiler: gcc 6.5.0 Build Commit ID: a6daab16ee8a1c15abc4c4a8a425e46d12033b5c Build Type: release ##########################################################################
Cluster configuration [root@localhost ~]# cat /etc/systemd/system/cockroach.service [Unit] Description=Cockroach Database Requires=network.target

[Service] Type=notify WorkingDirectory=/var/data/cockroach ExecStart=/usr/local/bin/cockroach start-single-node --certs-dir=certs --listen-addr=192.168.56.104 --http-addr=:8864 --external-io-dir=/var/data/cockroach/cockroach-backup TimeoutStopSec=60 Restart=always RestartSec=10 StandardOutput=syslog StandardError=syslog SyslogIdentifier=cockroach User=cockroach

[Install] WantedBy=default.target

FeiYangtze commented 2 years ago

SHOW JOBS; I did the same operation on the TPCC and the KDB. The restore of the TPCC was successful, but the restore of the KDB has not been completed for more than 15 hours. SHOW JOBS command is stuck. [root@localhost 09:11:20 ~]# cockroach sql --host=192.168.56.104 #
Welcome to the CockroachDB SQL shell.

All statements must be terminated by a semicolon.

To exit, type: \q.

#

Server version: CockroachDB CCL v21.1.10 (x86_64-unknown-linux-gnu, built 2021/10/07 02:39:03, go1.15.14) (same version as client)

Cluster ID: 675a0cf4-4ce1-4116-bd3c-5f210761eb60

#

Enter \? for a brief introduction.

# root@192.168.56.104:26257/defaultdb> show databases; database_name | owner | primary_region | regions | survival_goal ----------------+-------+----------------+---------+---------------- defaultdb | root | NULL | {} | NULL system | node | NULL | {} | NULL tpcc | root | NULL | {} | NULL (3 rows)

Time: 8ms total (execution 7ms / network 0ms)

root@192.168.56.104:26257/defaultdb> show jobs;

FeiYangtze commented 2 years ago

Operation record [root@localhost 17:25:06 /var/data/cockroach/cockroach-backup]# cockroach sql --host=192.168.56.104 #
Welcome to the CockroachDB SQL shell.

All statements must be terminated by a semicolon.

To exit, type: \q.

#

Server version: CockroachDB CCL v21.1.10 (x86_64-unknown-linux-gnu, built 2021/10/07 02:39:03, go1.15.14) (same version as client)

Cluster ID: 675a0cf4-4ce1-4116-bd3c-5f210761eb60

#

Enter \? for a brief introduction.

# root@192.168.56.104:26257/defaultdb> show databases; database_name | owner | primary_region | regions | survival_goal ----------------+--------+----------------+---------+---------------- defaultdb | root | NULL | {} | NULL kdb | kadmin | NULL | {} | NULL system | node | NULL | {} | NULL tpcc | root | NULL | {} | NULL (4 rows)

Time: 1ms total (execution 1ms / network 0ms)

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/tpcc-bak'; path

(0 rows)

Time: 0ms total (execution 0ms / network 0ms)

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/kdb-bak'; path

(0 rows)

Time: 1ms total (execution 0ms / network 0ms)

Time: 427ms total (execution 427ms / network 0ms)

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/tpcc-bak'; path

/2022/08/12-092814.62 (1 row)

Time: 1ms total (execution 0ms / network 0ms)

Time: 33.505s total (execution 33.505s / network 0.000s)

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/kdb-bak'; path

/2022/08/12-092836.58 (1 row)

Time: 2ms total (execution 2ms / network 0ms)

root@192.168.56.104:26257/defaultdb> drop database tpcc cascade; DROP DATABASE

Time: 423ms total (execution 423ms / network 0ms)

root@192.168.56.104:26257/defaultdb> drop database kdb cascade; DROP DATABASE

Time: 6.618s total (execution 6.618s / network 0.000s)

root@192.168.56.104:26257/defaultdb> show databases; database_name | owner | primary_region | regions | survival_goal ----------------+-------+----------------+---------+---------------- defaultdb | root | NULL | {} | NULL system | node | NULL | {} | NULL (2 rows)

Time: 1ms total (execution 1ms / network 0ms)

Time: 1.431s total (execution 1.431s / network 0.000s)

root@192.168.56.104:26257/defaultdb> show databases; database_name | owner | primary_region | regions | survival_goal ----------------+-------+----------------+---------+---------------- defaultdb | root | NULL | {} | NULL system | node | NULL | {} | NULL tpcc | root | NULL | {} | NULL (3 rows)

Time: 1ms total (execution 1ms / network 0ms)

root@192.168.56.104:26257/defaultdb> restore database kdb from '/2022/08/12-092836.58' in 'nodelocal://self/kdb-bak';

FeiYangtze commented 2 years ago

debug.zip

cockroachdb / cockroach

restore never end #85180

Welcome to the CockroachDB SQL shell.

All statements must be terminated by a semicolon.

To exit, type: \q.

Server version: CockroachDB CCL v21.1.10 (x86_64-unknown-linux-gnu, built 2021/10/07 02:39:03, go1.15.14) (same version as client)

Cluster ID: 675a0cf4-4ce1-4116-bd3c-5f210761eb60

Enter \? for a brief introduction.

Welcome to the CockroachDB SQL shell.

All statements must be terminated by a semicolon.

To exit, type: \q.

Server version: CockroachDB CCL v21.1.10 (x86_64-unknown-linux-gnu, built 2021/10/07 02:39:03, go1.15.14) (same version as client)

Cluster ID: 675a0cf4-4ce1-4116-bd3c-5f210761eb60

Enter \? for a brief introduction.

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/tpcc-bak'; path

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/kdb-bak'; path

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/tpcc-bak'; path

root@192.168.56.104:26257/defaultdb> show backups in 'nodelocal://self/kdb-bak'; path