couchbase-partners / helm-charts

Helm charts for deployed couchbase services
Apache License 2.0
23 stars 38 forks source link

Clarification and Failure of Instance Metadata Authentication for Couchbase Operator Backup #128

Open camilamacedo86 opened 3 months ago

camilamacedo86 commented 3 months ago

Description

The documentation for configuring instance metadata authentication for Couchbase Operator Backup (link) IHMO is unclear on whether providing the secret is always necessary. The primary advantage of using IAM roles (specifically roles/iam.workloadIdentityUser) as you can see (here) is to avoid the need for secrets. Thus, if using a secret is mandatory, it shows for me that contradicts the purpose of IAM roles.

Furthermore, the instance metadata authentication option does not function as expected. I attempted this with the latest release couchbase/operator-backup:1.3.8.

Steps to Reproduce

  1. Use the service account serviceAccountName: couchbase-backup.
  2. Bind the Service Account to Workload Identity.
  3. Create roles for roles/storage.objectCreator and roles/storage.objectViewer.
  4. Apply the workaround mentioned in this issue: https://github.com/couchbase-partners/helm-charts/issues/126#issuecomment-2126675183.
  5. Execute a backup job and observe the logs.

Observed Behavior

The job fails to connect to the API, producing the following log output:

# kubectl logs couchbase-backup-<version> -n <namespace>
2024-05-23T14:25:37 INFO couchbase-operator-backup/1.3.8 (commit/27bb25de3323de5d3abc8cffe7e607d49e1b227d)
2024-05-23T14:25:37 INFO Timestamp: 2024-05-23 14:25:37.676246
2024-05-23T14:25:37 INFO Arguments: cluster=couchbase, mode=backup, full=False, incremental=True, backup_ret=720.0, disable_bucket_config=False, force_delete_lockfile=False, repo=None, start=None, end=None, map_data=None, filter_keys=None, filter_values=None, enable_bucket_config=False, force_updates=False, include_data=None, exclude_data=None, disable_views=False, disable_gsi_indexes=False, disable_ft_indexes=False, disable_ft_alias=False, disable_data=False, disable_analytics=False, disable_eventing=False, disable_cluster_analytics=False, disable_bucket_query=False, disable_cluster_query=False, cacert=None, log_ret=168.0, verbosity=INFO, s3_bucket=None, obj_store=gs://my-couchbase, obj_auth_by_instance_metadata=None, obj_endpoint=None, obj_cacert=None, s3_force_path_style=True, threads=1, default_recovery=none
2024-05-23T14:25:37 INFO Checking connection to Kubernetes API...
2024-05-23T14:26:37 ERROR Subprocess call exited with non-zero return code 1
2024-05-23T14:26:37 ERROR Arguments: cbbackupmgr info --json --archive gs://my-couchbase/archive --obj-staging-dir /data/staging --s3-force-path-style
2024-05-23T14:26:37 ERROR Stdout: b'{"error":"failed to lock staging directory: timed out after 1m0s waiting for an exclusive lock to populate staging directory, please try again or use \'--log-level debug\' for more information"}\n'
2024-05-23T14:26:37 ERROR Stderr: b''
2024-05-23T14:26:37 ERROR Failed to list repositories
2024-05-23T14:26:37 ERROR Command '['cbbackupmgr', 'info', '--json', '--archive', 'gs://my-couchbase/archive', '--obj-staging-dir', '/data/staging', '--s3-force-path-style']' returned non-zero exit status 1.

Expected Behavior

The backup job should successfully connect to the API and perform the backup without requiring a secret when using IAM roles.

Additional Information

Configuration

cluster:
  backup:
    image: couchbase/operator-backup:1.3.8
    managed: true
  serviceAccountName: couchbase-backup
backups:
  couchbase-backup:
    name: couchbase-backup
    strategy: full_incremental
    full:
      schedule: "0 3 * * 0"
    incremental:
      schedule: "0 3 * * 1-6"
    successfulJobsHistoryLimit: 1
    failedJobsHistoryLimit: 3
    size: 20Gi
    objectStore:
      uri: gs://my-couchbase
      useIAM: true
      endpoint:
        url: storage.googleapis.com

Request

Please clarify the documentation regarding the necessity of secrets when using IAM roles. Additionally, provide a resolution for the failure observed during instance metadata authentication.

camilamacedo86 commented 3 months ago

I could make it work by doing some changes manually. Following the suggestions to sort it out:

a) The CRD CouchbaseBackup need a spec for we are able to add the the required annotation to grant the IAM permissions to the ServiceAccount: (Also, the HelmChart needs allow us to provide the annotation via the values)

example

Name:                couchbase-backup
Namespace:           my-namespace
Annotations:         iam.gke.io/gcp-service-account: couchbase-backup@my-project.iam.gserviceaccount.com

b) It would either helpful allow we create the serviceaccount manually if we wish to do so instead of always created it with the operator

c) The docs are missing examples about how to configure it and the required permissions. I need to grant all the following ones to check it working. Could you please clarify what permissions should be required?