AliyunContainerService / velero-plugin

Apache License 2.0
79 stars 24 forks source link

集群备份失败 #24

Open DeepLJH0001 opened 4 years ago

DeepLJH0001 commented 4 years ago

What steps did you take and what happened: [A clear and concise description of what the bug is.]

velero backup后提示成功,但是在restore后发现报错。 velero backup create xxx --snapshot-volumes=false --wait

What did you expect to happen:

正常恢复

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

[root@localhost velero-plugin]# velero restore create --from-backup clusterbk --restore-volumes=false --wait
Restore request "clusterbk-20200107024432" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
...............................................................
Restore completed with status: Failed. You may check for more information using the commands `velero restore describe clusterbk-20200107024432` and `velero restore logs clusterbk-20200107024432`.
[root@localhost velero-plugin]# velero restore describe clusterbk-20200107024432
Name:         clusterbk-20200107024432
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Failed (run 'velero restore logs clusterbk-20200107024432' for more information)

Backup:  clusterbk

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  false
[root@localhost velero-plugin]# velero restore logs clusterbk-20200107024432
An error occurred: file not found
[root@localhost velero-plugin]#
[root@localhost velero-plugin]#
[root@localhost velero-plugin]# velero backup get
NAME        STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
clusterbk   Completed   2020-01-06 21:17:22 -0500 EST   29d       default            <none>

Environment: centos 7.7

DeepLJH0001 commented 4 years ago

image

DeepLJH0001 commented 4 years ago

备份存在,但是无法恢复

DeepLJH0001 commented 4 years ago

image

haoshuwei commented 4 years ago

velero restore get 能看到clusterbk-20200107024432

DeepLJH0001 commented 4 years ago

@haoshuwei 可以的,显示的状态是失败。

[root@localhost harbor]# velero restore get
NAME                       BACKUP      STATUS   WARNINGS   ERRORS   CREATED                         SELECTOR
clusterbk-20200107023607   clusterbk   Failed   0          0        2020-01-07 02:36:07 -0500 EST   <none>
clusterbk-20200107024432   clusterbk   Failed   0          0        2020-01-07 02:44:32 -0500 EST   <none>
haoshuwei commented 4 years ago
velero restore logs clusterbk-20200107023607 |grep error

看下错误日志输出是什么

haoshuwei commented 4 years ago

可能是你的集群在restore的时候有资源冲突,比如port被占用等情况; 一般建议对应用单独备份和恢复,集群备份会涉及到全局范围的一些资源, 容易冲突

DeepLJH0001 commented 4 years ago

能否多一点错误的日志信息,方便进行排查

prometheus-tao commented 9 months ago

@DeepLJH0001 可以看看backupstroagelocation的状态是否正常