IBM / signoff-pxb

Repos for pxb related release sign-off
Apache License 2.0
0 stars 0 forks source link

PX Backup 2.6.0 Sign off #17

Closed trenukarya-px closed 4 months ago

trenukarya-px commented 6 months ago

PXB Sign-off-Template Following are the tasks which has to be completed by following teams

NOTE: This issue should not be closed until PX-Backup build/version pushed to IBM Cloud Catalog

Sign-off process Step 1:

Portworx Team:

IBM Team:

Sign-off process Step 2:

NOTE: Once above tasks completed and marked as done including sign-off then PX-Backup and IBM Team need to complete following tasks as well

PX-Backup Team:

IBM Team:

trenukarya-px commented 6 months ago
2 6 0_Testrail_Milestone_1 2 6 0_Testrail_Milestone_2 2 6 0_Testrail_Milestone_3

Testrail 2.6.0 milestone images are uploaded. 1 test failure because of 3 low priority/severity ticks - PB-4363, PB-4806, PB-4805

trenukarya-px commented 6 months ago

keycloak_with_ssl_system_test.xls kubevirt_system_tests.xls orphaned_backup_delete___px.xls scale_limit_verification_with_kubevirt_resources.xls

Test-case details attached

trenukarya-px commented 6 months ago

Compatibility Matrix for IKS/ROKS:

IKS: 1.27.0 1.28.4

ROKS: 4.12.44 4.13.13

Balachandar-Pan commented 6 months ago

stork1.log stork2.log stork3.log pxb260.log pxb260_image version.txt

arahamad commented 6 months ago

@trenukarya-px , following are the supported IKS and ROKS versions

IKS versions: 1.26, 1.27, 1.28

ROKS versions: 4.12, 4.13, 4.14

Can you please let us know what is the plan to support remaining IKS/ROKS versions like IKS: 1.26 and ROKS: 4.14?

if any user using these IKS and ROKS versions how user will use PX-Backup, will catalog allow px-backup instance creation or not?

cc @ambiknai

ambiknai commented 6 months ago

Also PX Backup 2.5.1 was supported in

As 2.6.0 is not supported in these versions how can user deploy px-backup on those clusters.

trenukarya-px commented 6 months ago

@ambiknai We got note for only these validations from Vipin Panavil Kallat as IBM request. Hence we considered these for 2.6.0. We can consider 4.14 in next version of Px Backup. And, we support only N-2 versions for any K8s flavors. Customer have to upgrade their K8s versions to use PX Backup 2.6.0 version.

Cc: @kshithijiyer-px

arahamad commented 6 months ago

@trenukarya-px , Can you please share doc link where this supported N-2 versions mentioned, I mean to say any way where user can see what is supported and what not.

trenukarya-px commented 6 months ago

@arahamad @ambiknai 1.2.6 IKS is also qualified.

https://docs.portworx.com/portworx-backup-on-prem/install/install-prereq captures the compatibility matrix. But currently it is not limited to N-2 but going forward, it will be N-2 for all versions support.

arahamad commented 6 months ago

I tried to execute torpedo tests on different-2 IKS and ROKS cluster versions and looks it failed on all clusters

@trenukarya-px and @ambiknai , can you please check results torpedo-out_for_IKS_1.26.11.txt torpedo-out_for_IKS_1.27.8.txt torpedo-out_for_IKS_1.28.4.txt torpedo-out_for_IKS_1.29.0.txt torpedo-out_for_ROKS_4.12.44.txt torpedo-out_for_ROKS_4.13.13.txt torpedo-out_for_ROKS_4.14.6.txt

ambiknai commented 6 months ago

@trenukarya-px I tried for IKS 1.28.4 and this is what I see

Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: vpc.block.csi.ibm.io
               volume.kubernetes.io/storage-provisioner: vpc.block.csi.ibm.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       postgres-dc55cfcc4-fqkg6
Events:
  Type     Reason                Age                  From                                                                                      Message
  ----     ------                ----                 ----                                                                                      -------
  Normal   Provisioning          2m2s (x8 over 4m9s)  vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_35978e61-91d4-46c3-ae0b-9ced3936b953  External provisioner is provisioning volume for claim "postgres-csi-pxb-0-83717-01-11-03h24m24s/postgres-data"
  Warning  ProvisioningFailed    2m2s (x8 over 4m9s)  vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_35978e61-91d4-46c3-ae0b-9ced3936b953  failed to provision volume with StorageClass "postgres-sc": error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found
  Normal   ExternalProvisioning  8s (x18 over 4m9s)   persistentvolume-controller                                                               Waiting for a volume to be created either by the external provisioner 'vpc.block.csi.ibm.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % kubectl logs -n kube-system ibm-vpc-block-csi-controller-0   -c iks-vpc-block-driver > iks-vpc-block-driver.txt
ambikanair@Ambikas-MBP customer-notifications % vi iks-vpc-block-driver.txt
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % kubectl describe sc postgres-sc
Name:                  postgres-sc
IsDefaultClass:        No
Annotations:           description=Provides RWO and RWX Filesystem volumes
Provisioner:           vpc.block.csi.ibm.io
Parameters:            clusterID=openshift-storage,csi.storage.k8s.io/controller-expand-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/controller-expand-secret-namespace=openshift-storage,csi.storage.k8s.io/node-stage-secret-name=rook-csi-cephfs-node,csi.storage.k8s.io/node-stage-secret-namespace=openshift-storage,csi.storage.k8s.io/provisioner-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/provisioner-secret-namespace=openshift-storage,fsName=ocs-storagecluster-cephfilesystem
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>
ambikanair@Ambikas-MBP customer-notifications % 

Has something changed from torpedo side.

trenukarya-px commented 6 months ago

@Kshithij Iyer @.***> Can you please check this failure.

On Thu, Jan 11, 2024 at 9:22 AM Ambika Nair @.***> wrote:

@trenukarya-px https://github.com/trenukarya-px I tried for IKS 1.28.4 and this what I see

Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: vpc.block.csi.ibm.io volume.kubernetes.io/storage-provisioner: vpc.block.csi.ibm.io Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Used By: postgres-dc55cfcc4-fqkg6 Events: Type Reason Age From Message


Normal Provisioning 2m2s (x8 over 4m9s) vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_35978e61-91d4-46c3-ae0b-9ced3936b953 External provisioner is provisioning volume for claim "postgres-csi-pxb-0-83717-01-11-03h24m24s/postgres-data" Warning ProvisioningFailed 2m2s (x8 over 4m9s) vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_35978e61-91d4-46c3-ae0b-9ced3936b953 failed to provision volume with StorageClass "postgres-sc": error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found Normal ExternalProvisioning 8s (x18 over 4m9s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'vpc.block.csi.ibm.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered. @. customer-notifications % @. customer-notifications % @. customer-notifications % @. customer-notifications % kubectl logs -n kube-system ibm-vpc-block-csi-controller-0 -c iks-vpc-block-driver > iks-vpc-block-driver.txt @. customer-notifications % vi iks-vpc-block-driver.txt @. customer-notifications % @. customer-notifications % @. customer-notifications % kubectl describe sc postgres-sc Name: postgres-sc IsDefaultClass: No Annotations: description=Provides RWO and RWX Filesystem volumes Provisioner: vpc.block.csi.ibm.io Parameters: clusterID=openshift-storage,csi.storage.k8s.io/controller-expand-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/controller-expand-secret-namespace=openshift-storage,csi.storage.k8s.io/node-stage-secret-name=rook-csi-cephfs-node,csi.storage.k8s.io/node-stage-secret-namespace=openshift-storage,csi.storage.k8s.io/provisioner-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/provisioner-secret-namespace=openshift-storage,fsName=ocs-storagecluster-cephfilesystem AllowVolumeExpansion: True MountOptions: ReclaimPolicy: Delete VolumeBindingMode: Immediate Events: @.*** customer-notifications %

Has something change from torpedo side.

— Reply to this email directly, view it on GitHub https://github.com/IBM/signoff-pxb/issues/17#issuecomment-1886182968, or unsubscribe https://github.com/notifications/unsubscribe-auth/BABEXCYOYMP2ZFTCQ3TYFWTYN5OXFAVCNFSM6AAAAABA4IGNGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE4DEOJWHA . You are receiving this because you were mentioned.Message ID: @.***>

kshithijiyer-px commented 6 months ago

@ambiknai From the error which I see in the output you have shared it looks like the rook-csi-cephfs-provisioner secret is missing in openshift-storage namespace which is causing the PVC provisioning to fail.

error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found

This looks like a setup issue (IKS cluster) deployment issue and not a Px-Backup issue. From torpedo side there are no changes please check if there are any platform changes from the IBM cloud side.

ambiknai commented 6 months ago
ambikanair@Ambikas-MBP customer-notifications % kubectl config current-context
vpc-us-south-mzaznzg1/cmfkova20ie553c09bj0/admin
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % kubectl get nodes -o wide
NAME          STATUS   ROLES    AGE    VERSION       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
10.240.1.10   Ready    <none>   170m   v1.28.4+IKS   10.240.1.10   10.240.1.10   Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.1.11   Ready    <none>   169m   v1.28.4+IKS   10.240.1.11   10.240.1.11   Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.1.8    Ready    <none>   168m   v1.28.4+IKS   10.240.1.8    10.240.1.8    Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.1.9    Ready    <none>   168m   v1.28.4+IKS   10.240.1.9    10.240.1.9    Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns | grep  openshift-storage
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns
NAME                                       STATUS   AGE
central                                    Active   162m
default                                    Active   179m
ibm-cert-store                             Active   169m
ibm-operators                              Active   179m
ibm-system                                 Active   179m
kube-node-lease                            Active   179m
kube-public                                Active   179m
kube-system                                Active   179m
postgres-csi-pxb-0-58058-01-11-03h40m49s   Active   89m
postgres-csi-pxb-0-58075-01-11-03h57m29s   Active   72m
postgres-csi-pxb-0-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-0-84755-01-11-03h32m42s   Active   97m
postgres-csi-pxb-0-84851-01-11-03h48m55s   Active   81m
postgres-csi-pxb-1-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-1-84851-01-11-03h48m55s   Active   81m
postgres-csi-pxb-2-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-2-84851-01-11-03h48m55s   Active   81m
postgres-csi-pxb-3-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-3-84851-01-11-03h48m55s   Active   81m
postgres-csi-pxb-4-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-4-84851-01-11-03h48m55s   Active   80m
postgres-csi-pxb-5-83717-01-11-03h24m24s   Active   105m
postgres-csi-pxb-5-84851-01-11-03h48m55s   Active   80m
postgres-csi-pxb-6-84851-01-11-03h48m55s   Active   80m
postgres-csi-pxb-7-84851-01-11-03h48m55s   Active   80m
postgres-csi-pxb-8-84851-01-11-03h48m55s   Active   80m
postgres-csi-pxb-9-84851-01-11-03h48m55s   Active   80m
ambikanair@Ambikas-MBP customer-notifications % 
ambikanair@Ambikas-MBP customer-notifications % kubectl get nodes -o wide
NAME            STATUS   ROLES    AGE   VERSION       INTERNAL-IP     EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
10.240.11.48    Ready    <none>   56d   v1.27.6+IKS   10.240.11.48    10.240.11.48    Ubuntu 20.04.6 LTS   5.4.0-165-generic   containerd://1.7.7
10.240.11.62    Ready    <none>   10h   v1.27.8+IKS   10.240.11.62    10.240.11.62    Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.11.7     Ready    <none>   61d   v1.27.6+IKS   10.240.11.7     10.240.11.7     Ubuntu 20.04.6 LTS   5.4.0-165-generic   containerd://1.7.7
10.240.128.23   Ready    <none>   60d   v1.27.6+IKS   10.240.128.23   10.240.128.23   Ubuntu 20.04.6 LTS   5.4.0-165-generic   containerd://1.7.7
10.240.128.25   Ready    <none>   60d   v1.27.6+IKS   10.240.128.25   10.240.128.25   Ubuntu 20.04.6 LTS   5.4.0-165-generic   containerd://1.7.7
10.240.128.63   Ready    <none>   10h   v1.27.8+IKS   10.240.128.63   10.240.128.63   Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.128.64   Ready    <none>   10h   v1.27.8+IKS   10.240.128.64   10.240.128.64   Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.128.66   Ready    <none>   10h   v1.27.8+IKS   10.240.128.66   10.240.128.66   Ubuntu 20.04.6 LTS   5.4.0-169-generic   containerd://1.7.11
10.240.128.7    Ready    <none>   61d   v1.27.6+IKS   10.240.128.7    10.240.128.7    Ubuntu 20.04.6 LTS   5.4.0-165-generic   containerd://1.7.7
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns | grep  openshift-storage
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns
NAME                  STATUS   AGE
default               Active   61d
ibm-cert-store        Active   61d
ibm-operators         Active   61d
ibm-services-system   Active   40d
ibm-system            Active   61d
karpenter             Active   40d
kube-node-lease       Active   61d
kube-public           Active   61d
kube-system           Active   61d
ambikanair@Ambikas-MBP customer-notifications % 

We dont have openshift-storage in IKS. Pasted above is 1.27 and 1.28 namespace list.

@kshithijiyer-px Does this PR has any impact https://github.com/portworx/torpedo/pull/1958

ambiknai commented 6 months ago

4.12.44_1571_openshift

You can now execute 'kubectl' commands against your cluster. For example, run 'kubectl get nodes'.
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns | grep  openshift-storage               
ambikanair@Ambikas-MBP customer-notifications % kubectl get ns     
NAME                                               STATUS   AGE
calico-system                                      Active   61d
default                                            Active   61d
ibm-cert-store                                     Active   60d
ibm-odf-validation-webhook                         Active   61d
ibm-services-system                                Active   40d
ibm-system                                         Active   61d
kube-node-lease                                    Active   61d
kube-public                                        Active   61d
kube-system                                        Active   61d
openshift                                          Active   61d
openshift-apiserver                                Active   61d
openshift-apiserver-operator                       Active   61d
openshift-authentication                           Active   61d
openshift-authentication-operator                  Active   61d
openshift-cloud-credential-operator                Active   61d
openshift-cloud-network-config-controller          Active   61d
openshift-cluster-csi-drivers                      Active   61d
openshift-cluster-machine-approver                 Active   61d
openshift-cluster-node-tuning-operator             Active   61d
openshift-cluster-samples-operator                 Active   61d
openshift-cluster-storage-operator                 Active   61d
openshift-cluster-version                          Active   61d
openshift-config                                   Active   61d
openshift-config-managed                           Active   61d
openshift-config-operator                          Active   61d
openshift-console                                  Active   61d
openshift-console-operator                         Active   61d
openshift-console-user-settings                    Active   61d
openshift-controller-manager                       Active   61d
openshift-controller-manager-operator              Active   61d
openshift-dns                                      Active   61d
openshift-dns-operator                             Active   61d
openshift-etcd                                     Active   61d
openshift-etcd-operator                            Active   61d
openshift-image-registry                           Active   61d
openshift-infra                                    Active   61d
openshift-ingress                                  Active   61d
openshift-ingress-canary                           Active   61d
openshift-ingress-operator                         Active   61d
openshift-insights                                 Active   61d
openshift-kube-apiserver                           Active   61d
openshift-kube-apiserver-operator                  Active   61d
openshift-kube-controller-manager                  Active   61d
openshift-kube-controller-manager-operator         Active   61d
openshift-kube-proxy                               Active   61d
openshift-kube-scheduler                           Active   61d
openshift-kube-scheduler-operator                  Active   61d
openshift-kube-storage-version-migrator            Active   61d
openshift-kube-storage-version-migrator-operator   Active   61d
openshift-machine-api                              Active   61d
openshift-machine-config-operator                  Active   61d
openshift-marketplace                              Active   61d
openshift-monitoring                               Active   61d
openshift-multus                                   Active   61d
openshift-network-diagnostics                      Active   61d
openshift-network-operator                         Active   61d
openshift-node                                     Active   61d
openshift-operator-lifecycle-manager               Active   61d
openshift-operators                                Active   61d
openshift-roks-metrics                             Active   61d
openshift-route-controller-manager                 Active   61d
openshift-service-ca                               Active   61d
openshift-service-ca-operator                      Active   61d
openshift-user-workload-monitoring                 Active   61d
tigera-operator                                    Active   61d
ambikanair@Ambikas-MBP customer-notifications % kubectl get nodes -o wide
NAME            STATUS   ROLES           AGE     VERSION            INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                               KERNEL-VERSION                 CONTAINER-RUNTIME
10.240.11.50    Ready    master,worker   7d18h   v1.25.14+a52e8df   10.240.11.50    10.240.11.50    Red Hat Enterprise Linux 8.8 (Ootpa)   4.18.0-477.27.1.el8_8.x86_64   cri-o://1.25.5-2.rhaos4.12.git0217273.el8
10.240.11.64    Ready    master,worker   10h     v1.25.14+a52e8df   10.240.11.64    10.240.11.64    Red Hat Enterprise Linux 8.9 (Ootpa)   4.18.0-513.9.1.el8_9.x86_64    cri-o://1.25.5-2.rhaos4.12.git0217273.el8
10.240.128.38   Ready    master,worker   47d     v1.25.14+bcb9a60   10.240.128.38   10.240.128.38   Red Hat Enterprise Linux 8.8 (Ootpa)   4.18.0-477.27.1.el8_8.x86_64   cri-o://1.25.4-4.1.rhaos4.12.gitb9319a2.el8
10.240.128.46   Ready    master,worker   7d18h   v1.25.14+a52e8df   10.240.128.46   10.240.128.46   Red Hat Enterprise Linux 8.8 (Ootpa)   4.18.0-477.27.1.el8_8.x86_64   cri-o://1.25.5-2.rhaos4.12.git0217273.el8
10.240.128.67   Ready    master,worker   10h     v1.25.14+a52e8df   10.240.128.67   10.240.128.67   Red Hat Enterprise Linux 8.9 (Ootpa)   4.18.0-513.9.1.el8_9.x86_64    cri-o://1.25.5-2.rhaos4.12.git0217273.el8
ambikanair@Ambikas-MBP customer-notifications % 
kshithijiyer-px commented 6 months ago

@kshithijiyer-px Does this PR has any impact portworx/torpedo#1958 ?

This code will only impact if you have changed the provisioner value to cephfs-csi if the value passed is ibm we aren't seeing any issue. I am not hitting any issues in our runs as which we did after merging this PR.

One thing which I suspect is that the there is something wrong in the IBM provisioner which is causing this failure. You'll need to check the provisioner code for any new commits which is causing this issue.

ambiknai commented 6 months ago

You'll need to check the provisioner code for any new commits which is causing this issue. - do you mean ibm-vpc-block-csi-driver provisoner code?

kshithijiyer-px commented 6 months ago

You'll need to check the provisioner code for any new commits which is causing this issue. - do you mean ibm-vpc-block-csi-driver provisoner code?

Yes

ambiknai commented 6 months ago

@kshithijiyer-px I can check that. Could you pls explain why the scenario of ceph comes here when the provisioner provided is "ibm" during torpedo execution.

error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found

Is that some default provisioner that torpedo tests fall back to . Volume provisioning request is received by correct CSI driver as we can see vpc.block.csi.ibm.io in the logs.

ambiknai commented 6 months ago

The call fails even before reaching vpc-block-csi driver. I checked csi-provisiooner logs

d4", APIVersion:"v1", ResourceVersion:"16215", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "postgres-sc": error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found
I0112 00:37:02.839116       1 reflector.go:281] sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845: forcing resync
I0112 00:37:50.079933       1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.CSINode total 8 items received
I0112 00:39:31.311487       1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.VolumeAttachment total 6 items received
I0112 00:39:36.220710       1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.Node total 13 items received
I0112 00:40:20.040688       1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.PersistentVolumeClaim total 6 items received
I0112 00:41:05.933709       1 controller.go:1337] provision "postgres-csi-pxb-0-58075-01-11-03h57m29s/postgres-data" class "postgres-sc": started
I0112 00:41:05.934294       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"postgres-csi-pxb-0-58075-01-11-03h57m29s", Name:"postgres-data", UID:"4de8f722-2023-4f92-83ed-8bff9de3c1d4", APIVersion:"v1", ResourceVersion:"16215", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "postgres-csi-pxb-0-58075-01-11-03h57m29s/postgres-data"
W0112 00:41:05.950508       1 controller.go:934] Retrying syncing claim "4de8f722-2023-4f92-83ed-8bff9de3c1d4", failure 339
E0112 00:41:05.950604       1 controller.go:957] error syncing claim "4de8f722-2023-4f92-83ed-8bff9de3c1d4": failed to provision volume with StorageClass "postgres-sc": error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found
I0112 00:41:05.950640       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"postgres-csi-pxb-0-58075-01-11-03h57m29s", Name:"postgres-data", UID:"4de8f722-2023-4f92-83ed-8bff9de3c1d4", APIVersion:"v1", ResourceVersion:"16215", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "postgres-sc": error getting secret rook-csi-cephfs-provisioner in namespace openshift-storage: secrets "rook-csi-cephfs-provisioner" not found
I0112 00:41:15.149113       1 reflector.go:559] sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:848: Watch close - *v1.StorageClass total 6 items received
I0112 00:43:06.113649       1 reflector.go:559] sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845: Watch close - *v1.PersistentVolume total 10 items received
kshithijiyer-px commented 6 months ago

So @ambiknai it looks like a IBM CSI-Provisioner issue right?

ambiknai commented 6 months ago

No @kshithijiyer-px . This is external-csi-provisioner sidecar.

So the error clearly says that it is unable to find the secret that is mentioned in the StorageClass Spec. We dont have openshift-storage namespace in this cluster. This usecase does not fit for execution in the cluster I created. And as I mentioned the provider is "ibm" . So as per your comment, this usecase should not ideally get executed in IBM usecase is what I understand.

ambiknai commented 5 months ago

Hi @kshithijiyer-px The workaround you suggested looks good for IKS

1.29-IKS-torpedo.txt 1.25-iks-torpedo.txt 1.28-IKS-torpedo.txt

When I ran the suite against ROKS, test fails for diff reason. could you pls check once

4.13.23_openshift.txt

ambiknai commented 5 months ago

Summary

  1. We had to do a workaround to fix IKS execution - @kshithijiyer-px is working with framework team to get a patch release so that there is a permanent fix.
  2. We are blocked on ROKS test execution. - @kshithijiyer-px confirmed that this is not a product issue but could be some thing related to torpedo. We are waiting for further updates. @trenukarya-px could you please make sure ROKS use case is also covered in your torpedo execution. Also any configuration changes w.r.t Jenkins/Torpedo is conveyed.

@arahamad @sandaymin123 ^^

trenukarya-px commented 5 months ago

@ambiknai We are trying to integrate ROKS into our pipeline. We are facing issues in integrating this. There might be some CSI issues but can't confirm.

There are no torpedo issues in these: We had to do a workaround to fix IKS execution - @kshithijiyer-px is working with framework team to get a patch release so that there is a permanent fix. [Thontesh] There is no torpedo issues as I mentioned before multiple times. There is security issue in ROKS which broke our PG app spec. Hence, we moved to mysql app. This requires changes in PG app spec public repo.

We are blocked on ROKS test execution. - @kshithijiyer-px confirmed that this is not a product issue but could be some thing related to torpedo. We are waiting for further updates. [Thontesh] There is no Torpedo issue again here also. Lets wait for update from @kshithijiyer-px

kshithijiyer-px commented 5 months ago

Hello @ambiknai We did an internal run where we aren't hitting the issue. Please find the log attached: console.txt

Here we are running all the tests and we see that the size issue isn't hit.

Can you share some more details on what is your ROKS configuration you are running with?

We usually run out jobs with machine instance type as bx2.4x16 can you check and confirm you are running with the a similar spec instance or higher?

One other noticeable thing which I see is that our "--driver-start-timeout", "30m0s", where are yours is "--driver-start-timeout", "20m0s",

Can you do these changes and let us know if you are still seeing the issue?

ambiknai commented 5 months ago

@kshithijiyer-px I see test cases have failed in the logs shared

[2024-01-18T15:51:53.386Z] INFO[2024-01-18 15:51:53] ------------------------

[2024-01-18T15:51:53.386Z] 2024-01-18 15:51:53 +0000:[INFO] [{AddMultipleNamespaceLabels}] [tests.EndPxBackupTorpedoTest:#7261] - >>>> FAILED TEST: {AddMultipleNamespaceLabels} Add multiple labels to namespaces, perform manual backup, schedule backup using namespace label and restore

[2024-01-18T15:51:53.642Z] 2024-01-18 15:51:53 +0000:[INFO] [{AddMultipleNamespaceLabels}] [tests.DeleteAllNamespacesCreatedByTestCase:#10193] - Deleting namespace [mysql-ibm-pxb-0-85583-85583-01-18-15h49m27s]

yeah.. true that size issue is not seen.

Cluster Details

  1. --flavor bx2.4x16
  2. 4.13.23_openshift

Please share ROKS version tried from your end.

kshithijiyer-px commented 5 months ago

@ambiknai it's the first run which we ran we are still looking at the failures, we just wanted to make sure if we were also seeing the same size issue which we aren't seeing in our runs.

  1. 4.12.44
  2. 4.13.11
ambiknai commented 5 months ago

@trenukarya-px @kshithijiyer-px IKS 2.6.0 verification completed

1.29-IKS-torpedo.txt 1.25-iks-torpedo.txt 1.28-IKS-torpedo.txt

As per the discussion, we could not do ROKS torpedo execution due to automation stability issues( automation not qualified for ROKS). IBM torpedo tests for ROKS are hence failing. Also for future releases, can we expect automation fix in place for ROKS?

Could you pls attach successful ROKS test result from your side .

After reviewing our IKS test results and considering the successful test execution carried out by PX Team on ROKS, we collectively affirm our decision to provide the sign-off for this version.

cc: @arahamad

trenukarya-px commented 5 months ago

Thanks @kshithijiyer-px @ambiknai !!

We are good with uploading PX Backup 2.6.0 to Catalog based on all these results.

Uploading few more results: roks-run-PXE.txt roks-run-CSI

Balachandar-Pan commented 5 months ago

We have validated on prod catalog

[root@ip-10-13-8-162 ~]# helm ls -A NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION px-backup pxb260 1 2024-01-25 05:50:42.331319549 +0000 UTC deployed px-central-2.6.0 2.6.0 [root@ip-10-13-8-162 ~]# [root@ip-10-13-8-162 ~]# [root@ip-10-13-8-162 ~]# kubectl -n pxb260 get po NAME READY STATUS RESTARTS AGE px-backup-79dcff5d6b-959xc 1/1 Running 1 (4h9m ago) 4h10m pxc-backup-mongodb-0 1/1 Running 0 4h10m pxc-backup-mongodb-1 1/1 Running 0 4h10m pxc-backup-mongodb-2 1/1 Running 0 4h10m pxcentral-apiserver-85764cdf7-7nllq 1/1 Running 0 4h10m pxcentral-backend-7469654777-vjbzb 1/1 Running 0 4h7m pxcentral-frontend-67d8b88bbc-m2jqz 1/1 Running 0 4h7m pxcentral-keycloak-0 1/1 Running 0 4h10m pxcentral-keycloak-postgresql-0 1/1 Running 0 4h10m pxcentral-lh-middleware-79cbc49f55-fd8nd 1/1 Running 0 4h7m pxcentral-mysql-0 1/1 Running 0 4h10m pxcentral-post-install-hook-r5lb9 0/1 Completed 0 4h10m [root@ip-10-13-8-162 ~]#