Lirt / velero-plugin-for-openstack

Openstack Cinder, Manila and Swift plugin for Velero backups
MIT License
26 stars 13 forks source link

Restic backups fail #5

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi there,

I'm not even sure the problem comes from your plugin, but I'm trying to use restic backups (as there is no Block Store for now) with no success.

Do you happen to have any issues with your stack?

Configuration :

Symptoms:

$ velero backup describe test-velero-backup [...] Phase: PartiallyFailed (run velero backup logs test-velero-backup for more information) Errors: 1 Warnings: 0 [...] Started: 2020-10-15 15:41:49 +0200 CEST Completed: 2020-10-15 15:41:55 +0200 CEST [...] Total items to be backed up: 10 Items backed up: 10

$ velero backup logs test-velero-backup An error occurred: request failed: 401 Unauthorized: Temp URL invalid


3. There is no podvolumebackup or resticrepository created.

It'd be awesome if you could share your experiences on the matter.
Lirt commented 4 years ago

Hi @pierredadt

I have to say I didn't try the restic with velero yet, I have very simple use-case to only backup k8s resources.

But anyway I will try to check (in the following days) if this issue is related to something missing in plugin implementation or it is related to something else.

ghost commented 4 years ago

Thanks for you input! Please keep in touch, I only flew over Go but I'd like to re-implement what's been done for Ark with https://github.com/cisco-sso/velero-plugin-openstack one day and would benefit any insight.

Lirt commented 4 years ago

So I tried your approach and I can reproduce the error.

Here are results of commands which shows that something is wrong:

$ kubectl describe backupstoragelocations.velero.io default
Spec:
  Config:
    Restic Repo Prefix:  lirt-swift
  Object Storage:
    Bucket:  lirt-swift-test
  Provider:  swift

$ velero restic repo get
NAME                   STATUS     LAST MAINTENANCE
velero-default-cz7nc   NotReady   <never>

$ kubectl get resticrepositories.velero.io
NAME                   AGE
velero-default-cz7nc   22m

$ kubectl describe resticrepositories.velero.io velero-default-cz7nc
Status:
  Message:  restic repository prefix (resticRepoPrefix) not specified in backup storage location's config
  Phase:    NotReady

I can see NotReady state (https://github.com/vmware-tanzu/velero/blob/main/pkg/restic/repository_ensurer.go#L144) and message that I didn't specify resticRepoPrefix (which I did). This can indicate that my restic repository was not initialized or something is wrong with it.

From there I got to restic documentation where there is a guide to setup it on Swift:

$ restic -r swift:container_name:/path init   # path is optional
enter password for new repository:
enter password again:
created restic repository eefee03bbd at swift:container_name:/path
Please note that knowledge of your password is required to access the repository.
Losing your password means that your data is irrecoverably lost.

Unfortunately my swift forbids me to create the repository so I cannot test it further now. Can you try to follow the docs and then configure config.resticRepoPrefix to match the restic repository and retry?

Here is the code responsible for configuring the prefix - https://github.com/vmware-tanzu/velero/blob/master/pkg/restic/config.go#L43-L79

I think in your case it should be swift:public_volumes but I cannot guarantee it.

ghost commented 4 years ago

Hi,

I didn't get I had to manually create the Restic repo on Swift, I'm not sure it's the way it's supposed to work or if there is bug here.

Anyway, I gave it another go to try and go further.

Process:

Results:

Since the restic repo actually exists, the error message I had before – asking to set resticRepoPrefix in the BackupStorageLocation – disappeared and the restic commands are complete.

Velero finds the repository and tries to communicate with it, but... every restic command it tries to run fails because of a BadRequest.

  1. Velero can't unlock the repository because because the following command[1] failed to a BadRequest:

    error running command=restic unlock --repo=swift:velero:/public_cluster/snapshots/test-velero --password-file=/tmp/velero-restic-credentials-test-velero969826974 --cache-dir=/scratch/.cache/restic
    > unable to open repo at swift:velero:/public_cluster/snapshots/test-velero: conn.Authenticate: Bad Request

    https://github.com/vmware-tanzu/velero/blob/main/pkg/restic/repository_manager.go#L286
    https://github.com/vmware-tanzu/velero/blob/main/pkg/controller/restic_repository_controller.go#L144

  2. The ResticRepoLocation is NotReady because the following command[2] failed to a BadRequest:

    error running command=restic snapshots --repo=swift:velero:/public_cluster/snapshots/test-velero --password-file=/tmp/velero-restic-credentials-test-velero979394917 --cache-dir=/scratch/.cache/restic --last
    > unable to open repo at swift:velero:/public_cluster/snapshots/test-velero: conn.Authenticate: Bad Request

    https://github.com/vmware-tanzu/velero/blob/main/pkg/restic/repository_manager.go#L286


I obviously tried the same commands using the 0.9.6 restic docker from my laptop to figure if everything should work fine, and it does!

[1] (Redacted for readability)

$ docker run -ti --env-file os_envs.env -v /tmp/velero:/cache -v /tmp/password.txt:/tmp/password.txt \
  restic/restic:0.9.6 unlock \
  --repo=swift:velero:/public_cluster/snapshots \ 
  --password-file=/tmp/password.txt \
  --cache-dir=/cache
repository 4b894e25 opened successfully, password is correct
successfully removed locks

[2] (Redacted for readability)

$ docker run -ti --env-file os_envs.env -v /tmp/velero:/cache -v /tmp/password.txt:/tmp/password.txt \ 
  restic/restic:0.9.6 snapshots \
  --repo=swift:velero:/public_cluster/snapshots \ 
  --password-file=/tmp/password.txt \
  --cache-dir=/cache \
  --last
repository 4b894e25 opened successfully, password is correct

For what I can understand, it seems to be a Velero issue in the end.
I guess I should try and post an issue there for further investigation.

Does it speak to you in any way?

Thanks again for your implication.

Lirt commented 4 years ago

I think this issue is on Velero side as you also mentioned.

Maybe you could try to read the password file of the pod in the kubernetes to make sure credentials are 100% loaded correctly. The error is probably related to something with authentication if the error message doesn't lie (conn.Authenticate: Bad Request).

I noticed that you use different path to --repo in your docker command. Maybe it's not an issue, but just a note here.

--repo=swift:velero:/public_cluster/snapshots/test-velero
--repo=swift:velero:/public_cluster/snapshots
ghost commented 4 years ago

I noticed that in your docker command, you use different path to --repo. Maybe its not an issue, but just a note here.

You're right, my mistake. It does make a difference as the repo or object under /test-velero does not exist, and it's certainly the reason behind the BadRequest.

$ docker run -ti --env-file os_envs.env -v /tmp/velero:/cache -v /tmp/password.txt:/tmp/password.txt restic/restic:0.9.6 unlock --repo=swift:velero:/public_cluster/snapshots/test-velero --password-file=/tmp/password.txt --cache-dir=/cache
Fatal: unable to open config file: conn.Object: Object Not Found
Is there a repository at the following location?
swift:velero:/public_cluster/snapshots/test-velero

If that's the case, it means that not only Velero does not create any parent repo (resticRepoPrefix) but it doesn't create any repo at all for individual backups (resticRepoPrefix/namespace).


I'm pretty sure I'm not supposed to create manually all my repos for Velero to be able to use restic!
It should actually run a restic init <repo> command sometime in the process, but there is no log trace of such a thing.

https://github.com/vmware-tanzu/velero/blob/87d86a45a6ca66c6c942c7c7f08352e26809426c/pkg/controller/restic_repository_controller.go#L212 https://github.com/vmware-tanzu/velero/blob/87d86a45a6ca66c6c942c7c7f08352e26809426c/pkg/restic/repository_manager.go#L186 As you can see, the string Is there a repository at the following location? is present on the unlock command, and Velero should trigger the repo creation.

And for what I can find in documentation, your plugin implements all the methods an ObjectStore should receive!


I'll take this one to Velero's github then, many thanks again for your input.
I can report my findings back here if you wish, in the meantime I think I can close the issue to keep your repo clean!

Lirt commented 4 years ago

I understand. It looks weird indeed. Documentation says restic implementation should cover also restic init.

ResticRepository - represents/manages the lifecycle of Velero’s restic repositories. Velero creates a restic repository per namespace when the first restic backup for a namespace is requested. The controller for this custom resource executes restic repository lifecycle commands – restic init, restic check, and restic prune.

And noted also in this backup workflow

  1. When found, Velero first ensures a restic repository exists for the pod's namespace, by: checking if a ResticRepository custom resource already exists if not, creating a new one, and waiting for the ResticRepository controller to init/check it

Thank you for reporting and insight.

Lirt commented 4 years ago

Hi again @pierredadt

I just wanted to mentioned I started to work on Cinder Block Storage integration with this plugin. I will let you know as soon as it is ready for review.

https://github.com/Lirt/velero-plugin-swift/issues/6