RamenDR / ramen

Apache License 2.0
73 stars 53 forks source link

Pass-through CA certificates to Velero not working #940

Closed pdumbre closed 1 year ago

pdumbre commented 1 year ago

Steps to reproduce Configured CA Certificates in DataProtectionApplication Instance of OADP operator Configured CA Certificates in ramen-dr-config-map Given CA certificates not preserved in BSL instance.

pdumbre commented 1 year ago

Ramen Config Map:

kind: ConfigMap
apiVersion: v1
metadata:
  name: ramen-dr-cluster-operator-config
  namespace: ibm-spectrum-fusion-ns
data:
  ramen_manager_config.yaml: |
    apiVersion: ramendr.openshift.io/v1alpha1
    drClusterOperator: {}
    health:
      healthProbeBindAddress: :8081
    kind: RamenConfig
    kubeObjectProtection:
      disabled: false
      veleroNamespaceName: openshift-adp
    leaderElection:
      leaderElect: true
      leaseDuration: 0s
      renewDeadline: 0s
      resourceLock: ""
      resourceName: dr-cluster.ramendr.openshift.io
      resourceNamespace: ""
      retryPeriod: 0s
    metrics:
      bindAddress: 127.0.0.1:9289
    ramenControllerType: dr-cluster
    s3StoreProfiles:
    - s3Bucket: isf-minio-site1
      s3CompatibleEndpoint: https://isf-minio-ibm-spectrum-fusion-ns.apps.rackae1.mydomain.com
      s3ProfileName: site1
      s3Region: site1
      s3SecretRef:
        name: isf-minio-site1
        namespace: ibm-spectrum-fusion-ns
      veleroNamespaceSecretKeyRef:
        key: cloud-site1
        name: cloud-credentials-site1
    - s3Bucket: isf-minio-site2
      s3CompatibleEndpoint: https://isf-minio-ibm-spectrum-fusion-ns.apps.rackae2.mydomain.com
      s3ProfileName: site2
      s3Region: site2
      s3SecretRef:
        name: isf-minio-site1
        namespace: ibm-spectrum-fusion-ns
      veleroNamespaceSecretKeyRef:
        key: cloud-site2
        name: cloud-credentials-site2
      caCertificates: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURiVENDQWxXZ0F3SUJBZ0lJWk9ydk1xMXNlUGN3RFFZSktvWklodmNOQVFFTEJRQXdKakVrTUNJR0ExVUUKQXd3YmFXNW5jbVZ6Y3kxdmNHVnlZWFJ2Y2tBeE5qZzJPRFF6TXpNNU1CNFhEVEl6TURZeE5URTFNelV6T1ZvWApEVEkxTURZeE5ERTFNelUwTUZvd0pqRWtNQ0lHQTFVRUF3d2JLaTVoY0hCekxuSmhZMnRoWlRFdWJYbGtiMjFoCmFXNHVZMjl0TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFwTmtlQzJVanN3V0UKM2E1dklsc2hMOUt3MXlCWHBTQTR0SHM2b21LSGxzZldnbEtUZ0ZjZkhJN0Nnd29VUVIySTc1NndObGQ4VHJYcApuQTBXb1kwcUNaQmdBRlZhT1cwSVhENlFTc0t1RllveW9ZVmV5ZVlRc0JBbFppYzhhUVRBSjhUYWxodzltak42CmFzNTlseXZRZkZYQStzNDYrMVVFc1hXMWFOM2poYjFhMEIwbGNjK3hJL1Uxdy9oZGR0eXBxeUZXU0tPczJINGIKcE5OVjNma2ZYQTZ4bHM3QUoxek1vVXhqTVh2VUpEMDVkL2p3aE80bk44NGt6aXloY3l2V3R0YTdpcDFFVDR6TgpKSmZLYzlVVm40SmZCK3J2RkErb0g3RXEyNU5SK2xPNjZXS0NIMzhIaU4xdi96UFU4cDYwNWV0N0VxRnUvN090ClpVdTVJb3dtS3dJREFRQUJvNEdlTUlHYk1BNEdBMVVkRHdFQi93UUVBd0lGb0RBVEJnTlZIU1VFRERBS0JnZ3IKQmdFRkJRY0RBVEFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCUmQybll5NU54MSswTE1ReURhZnI1SwowWktkSXpBZkJnTlZIU01FR0RBV2dCUkVNaUtDTDc4Wk5OTXJtSlhMS1IrbTgrUE5hVEFtQmdOVkhSRUVIekFkCmdoc3FMbUZ3Y0hNdWNtRmphMkZsTVM1dGVXUnZiV0ZwYmk1amIyMHdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUIKQUljODFjWWxIWnJ4b0VjT1ZVaTJOVHdpZGRPRjFPSUdVNUlIVUpqOUh6WTdYcEdOSHU4NC9KOXFYVGlJSVpmVwo4NFoyWlQzS28yeFZzUUZYRnJUMTYxZ2tDM3hPalhkTWptd0xYOUJ3L0wrYzNHa3Q4WGxTL2V0dkphb2pKWDV5CnIxbnBuZEtNTUgzWWlwcCtYSEN5VmZpcmFZTVB0ZGtzYVkxQU1XUEp0Q2R3eXA2RTJZZ3BQNE1NbHFNWC9zNkwKbUtHSHJzRFBjcjBQbVJPMUxkK3llb0wySzBwTW4wd0s1Qk9IZ09DZFRZLzZmekErakxXRDVQblZBNVlRNGNFVApIUmJDb0hzRzBRSndPTE02QnpNK2p2NEpqQlUrSGdWYzlqVjhCdDkxU1VLSW4ySVMybE9xUnRidVNpc0JFMzN6CmtpbDduenkyVFQyQVl1cVZtOFpRaFpvPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCi0tLS0tQkVHSU4gQ0VSVElGSUNBVEUtLS0tLQpNSUlERERDQ0FmU2dBd0lCQWdJQkFUQU5CZ2txaGtpRzl3MEJBUXNGQURBbU1TUXdJZ1lEVlFRRERCdHBibWR5ClpYTnpMVzl3WlhKaGRHOXlRREUyT0RZNE5ETXpNemt3SGhjTk1qTXdOakUxTVRVek5UTTRXaGNOTWpVd05qRTAKTVRVek5UTTVXakFtTVNRd0lnWURWUVFEREJ0cGJtZHlaWE56TFc5d1pYSmhkRzl5UURFMk9EWTRORE16TXprdwpnZ0VpTUEwR0NTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDam9GdEFoWmJ4UmRDZVN0ZE5VRmJ5CjRMTFZrT2wyb2pSTGJjWWhjS1NxbmZFcVpZZFlzWms3QVNKUG9WZWlUQjZEQkdadHZEY2hvZm1hc2pkREtZYk0KU3NUUmlobXdTNExXek5wNjZ2TzZjM2VzMm9kR2F2UjVPUXczeDN3TTZkK3dsS0tYdm5iWXB1TWREM0Ntai9TRgpmemN5NEtSRkhpOTBpckdyT1h5UXJnU0JKRkd1VjV2SktsR0tsWnZCREdjcGdFQkRyanZ5eG0rS01pZ3I4UFBlCnNPb2V5VitHR00rM3hLZEVYNjlXejh3UHVvUThsN09pc1UycmFSdVdZQ2dIazhHaWNUeVFTWkdtUWNiWGswM1cKaHl6N1FpK1I1QmNWZ3lrWG1NQ05lUTR0aS9kWlYrUU8veDFTT2s2WnBYdTYrbVFqMmpHUVVLeEhoR1lEcXN6cApBZ01CQUFHalJUQkRNQTRHQTFVZER3RUIvd1FFQXdJQ3BEQVNCZ05WSFJNQkFmOEVDREFHQVFIL0FnRUFNQjBHCkExVWREZ1FXQkJSRU1pS0NMNzhaTk5Ncm1KWExLUittOCtQTmFUQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUEKUkVXbFhEV2tKc0FZbjZ5WjFkQWxMUnFpQWY0TEtwTjlpRWU3azN5VmNud0h0djh3cVZWU3pyZkpFNmFGZXIwLwoxNHJmSExocGwrbG9JRUFpaWUzaStwbER4V3p4M1NJVFFPeVArc3hvVHVJeGRnWHdVNUtvR0t0eG5xOFJrUzB4CnkwOWFNcjlTVEhEVHBoMzJ0bGNaRFREMjEvSEkwaFNqTllyUjZwR1Y3cnhRbTRsTjVaSTJrSUIzOVhkaTIxdFcKZit6YkxZUjRLVXN5M3RzQ0VtS3l4dERtdmlUUEJ1ZzVUUHVTc2E2Qm9zdExtajFKNTQrclRaQVAxdUxSTU9rTApNbnZMb1NpdVRTRGpvVlMwK0VHVjlCMmQ4OFowV1VoWjJHbUYyYURMaXJjbkJyamJ5cFAvSzI1ZGsvVmR3emhrCmh5Rm9hMU1FZWtpTFFIT0R6WFgzNHc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
    volSync:
      disabled: true
    webhook:
      port: 9443
pdumbre commented 1 year ago

VRG content:

apiVersion: ramendr.openshift.io/v1alpha1
kind: VolumeReplicationGroup
metadata:
  name: fusion2
  namespace: fusion2
spec:
  kubeObjectProtection: {}
  pvcSelector: {}
  replicationState: primary
  s3Profiles:
    - site2
    - site1
  sync: {}
  volSync:
    disabled: true
status:
  conditions:
    - lastTransitionTime: '2023-06-23T13:29:39Z'
      message: Initializing VolumeReplicationGroup
      observedGeneration: 1
      reason: Initializing
      status: Unknown
      type: DataReady
    - lastTransitionTime: '2023-06-23T13:29:39Z'
      message: Initializing VolumeReplicationGroup
      observedGeneration: 1
      reason: Initializing
      status: Unknown
      type: DataProtected
    - lastTransitionTime: '2023-06-23T13:30:24Z'
      message: Restored cluster data
      observedGeneration: 2
      reason: Restored
      status: 'True'
      type: ClusterDataReady
    - lastTransitionTime: '2023-06-23T13:31:53Z'
      message: backupFailed; request deleted
      observedGeneration: 2
      reason: UploadError
      status: 'False'
      type: ClusterDataProtected
  kubeObjectProtection: {}
  lastUpdateTime: '2023-06-24T03:32:40Z'
  observedGeneration: 2
pdumbre commented 1 year ago

ramen-dr-cluster-operator-7b85678f6c-shk7d-manager.log

hatfieldbrian commented 1 year ago

Issue 1: Secret cloud-credentials-site2 is missing data for key cloud-site2

2023-06-24T03:28:41.797Z    INFO    controllers.VolumeReplicationGroup.vrginstance  velero/requests.go:730  Backup  {"VolumeReplicationGroup": "fusion2/fusion2", "rid": "8c60cda6-47fc-4190-a9f1-71a5e38297d0", "State": "primary", "phase": "Failed", "warnings": 0, "errors": 0, "failure": "unable to get credentials: unable to get key for secret: \"cloud-credentials-site2\" secret is missing data for key \"cloud-site2\"", "validation errors": []}

@pdumbre Please confirm referenced secret key-value pair contains S3 credentials in Velero format

Issue 2: site1 S3 store certificate signed by unknown authority

2023-06-24T03:28:41.983Z    INFO    controllers.VolumeReplicationGroup.vrginstance  velero/requests.go:730  Backup  {"VolumeReplicationGroup": "fusion2/fusion2", "rid": "0813add0-3d9f-40ec-a617-0f7e4fdbe395", "State": "primary", "phase": "Failed", "warnings": 0, "errors": 0, "failure": "error checking if backup already exists in object storage: rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Head \"https://isf-minio-ibm-spectrum-fusion-ns.apps.rackae1.mydomain.com/isf-minio-site1/fusion2/fusion2/kube-objects/1/velero/backups/fusion2--fusion2--1----site1/velero-backup.json\": x509: certificate signed by unknown authority", "validation errors": []}

@pdumbre Please specify CA certificates for site1 also

pdumbre commented 1 year ago

@hatfieldbrian : Even after specifying CA certs for site1 and changing key name for secret 'cloud-credentials-site2' , error persists. CACert specified in Ramen-DR config map not getting preserved in BSL instance of OADP operator.

hatfieldbrian commented 1 year ago

Thanks @pdumbre. Will you please share the updated config map, VRG, and Ramen log?

keslerzhu commented 1 year ago

Hello, everyone, I have reproduced the problem on our env. We have 2 OCP clusters: AE1 and AE2, each of which has an MinIO S3 storage. What I have done in this experiment is let velero on AE1 be able to access both MinIO on AE1 and AE2.

From screenshot, you can see there are two kinds of BSL:

  1. General BSL, which derive caCert from DPA, can access both MinIO on AE1 and AE2
  2. BSL generated by Ramen Recipe, which can access neither MinIO on AE1 nor MinIO on AE2, because of the lack of caCert.

Here is what I have done on AE1 ( I did not touch AE2):

  1. In DPA velero, I added second backupLocation which points AE2's MinIO storage. The caCert was made using AE2's root cert.
  2. In DPA velero, I changed this to false to ensure x.509 is used: insecureSkipTLSVerify: 'false'
  3. Verify BackupStorageLocations velero-1 and velero-2 are Available. :white_check_mark:
  4. in Ramen’s config map, I added caCertificates to both buckets
  5. I restarted RamenDR operator to make sure config is read.
  6. BackupStorageLocations fusion2-fusion2--1----site1 and 2 are still come and go, with Phase: Unavailable. :x:
  7. Confirmed caCert is not passed through into BackupStorageLocations fusion2-fusion2--1----site1 and 2

Screenshot 2023-06-26 at 16 18 21 Screenshot 2023-06-26 at 16 19 43 Screenshot 2023-06-26 at 16 26 18 Screenshot 2023-06-26 at 16 31 12

pdumbre commented 1 year ago

ramen-logs.zip

hatfieldbrian commented 1 year ago

I don't see the CaCertificates log entries in these logs. Seems to not have the CaCertificates patch.

pdumbre commented 1 year ago

ramen-dr-cluster-operator-7b85678f6c-6bsnm-manager.log

pdumbre commented 1 year ago

Changes working as expected. So closing the issue.

hatfieldbrian commented 1 year ago

Fixed by #925