IBM / signoff-pxb

Repos for pxb related release sign-off
Apache License 2.0
0 stars 0 forks source link

torpedo tests fails in production setup #21

Open ambiknai opened 4 months ago

ambiknai commented 4 months ago
 [FAILED] Unexpected error:
      <*errors.errorString | 0xc000917b50>: 
      failed to get [autopilot] namespace, Err: the server could not find the requested resource (get storageclusters.core.libopenstorage.org)
      {
          s: "failed to get [autopilot] namespace, Err: the server could not find the requested resource (get storageclusters.core.libopenstorage.org)",
      }
  occurred
  In [It] at: /go/src/github.com/portworx/torpedo/pkg/log/log.go:334 @ 02/12/24 09:43:15.357

  Full Stack Trace
    github.com/portworx/torpedo/pkg/log.FailOnError({0x8a77720, 0xc000917b50}, {0x80a0dcc?, 0xc001ce4000?}, {0x0?, 0x6dd7c20?, 0xc000633940?})
        /go/src/github.com/portworx/torpedo/pkg/log/log.go:334 +0x290
    github.com/portworx/torpedo/tests.processError({0x8a77720?, 0xc000917b50?}, {0x0?, 0x0?, 0x0?})
        /go/src/github.com/portworx/torpedo/tests/common.go:626 +0x45
    github.com/portworx/torpedo/tests.ValidateContext.func2.1.ValidateVolumes.func1.1()
        /go/src/github.com/portworx/torpedo/tests/common.go:1075 +0x34e
    github.com/portworx/torpedo/tests.ValidateContext.func2.1.ValidateVolumes.func1()
        /go/src/github.com/portworx/torpedo/tests/common.go:1043 +0x195
    github.com/portworx/torpedo/tests.ValidateVolumes(...)
        /go/src/github.com/portworx/torpedo/tests/common.go:1041
    github.com/portworx/torpedo/tests.ValidateContext.func2.1()
        /go/src/github.com/portworx/torpedo/tests/common.go:685 +0x14d

Cluster Details

IKS 1.28.6 --flavor bx2.4x16 --workers 4 Image: icr.io/ext/portworx/px-backup:2.6.0'

ambiknai commented 4 months ago

@trenukarya-px @kshithijiyer-px Is there any known impact for torpedo tests

cc: @arahamad

ambiknai commented 4 months ago

Hi @kshithijiyer-px

I've noticed that the tests have been consistently passing for the last two days. We haven't made any modifications from the Jenkins side. If any changes have been made in the Torpedo repository to address the issue, could you please create a release? This way, our Jenkins configuration can always reference that specific release. Currently, it's pointing to the Torpedo master branch, and occasional commits could cause certain tests to fail.

kshithijiyer-px commented 4 months ago

Hey @ambiknai This was mostly due a code change at the Framework level which we have reverted within a day.

ambiknai commented 4 months ago

Hi @kshithijiyer-px I understand there was a fix delivered. Could you please create a release at this point of commit so that I can pick the branch in Jenkins instead of master. and it would be stable

ambiknai commented 4 months ago

Hi @kshithijiyer-px Do you see any similar issue in your test run

+ cat px-install.json
{
    "apikey": "****",
    "namespace": "px-torpedo-ns",
    "resource_group": "vpc-e2e-test",
    "clusters": "vpc-jp-tok-mzkwmdu1",
    "storageClassName": "ibmc-vpc-block-5iops-tier"
}

++ pwd -P
+ PWD=/home/jenkins/workspace/Containers-Volumes/pxbackup-torpedo-e2e/src/github.com/IBM/ibm-csi-common
+ ibmcloud resource service-instance-create my-px-backup px-backup px-backup-enterprise jp-tok -p @/home/jenkins/workspace/Containers-Volumes/pxbackup-torpedo-e2e/src/github.com/IBM/ibm-csi-common/px-install.json
Creating service instance my-px-backup in resource group vpc-e2e-test of account IBM as contsto2@in.ibm.com...
OK
Service instance my-px-backup was created.

Name:             my-px-backup
ID:               crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:aacb1b76-b3b9-4294-ab55-7832b2d61a68::
GUID:             aacb1b76-b3b9-4294-ab55-7832b2d61a68
Location:         jp-tok
State:            provisioning
Type:             service_instance
Sub Type:         
Allow Cleanup:    false
Locked:           false
Created at:       2024-02-28T08:58:28Z
Updated at:       2024-02-28T08:58:29Z
Last Operation:             
                  Status    create in progress
                  Message   Started create instance operation

+ kubectl get nodes
+ awk 'NR==2{print $1}'
+ xargs -I '{}' kubectl label node '{}' px/enabled=false
node/10.244.0.123 labeled
+ sleep 90
++ kubectl describe deployment px-backup -n px-torpedo-ns
++ grep Image:
Error from server (NotFound): namespaces "px-torpedo-ns" not found
+ version=
kshithijiyer-px commented 4 months ago

Hey @ambiknai I haven't see the issue in any of our runs, from the logs what I see is that the namespace px-torpedo-ns doesn't exist. Can you share the some more details and probably check the logs to see if there is some issue with the parameters you are passing to the px-backup terraform script?

ambiknai commented 4 months ago

Nothing changed from jenkins configuration. Pasting logs of last successful build

+ cat px-install.json
{
    "apikey": "****",
    "namespace": "px-torpedo-ns",
    "resource_group": "vpc-e2e-test",
    "clusters": "vpc-jp-tok-mzm2mta3",
    "storageClassName": "ibmc-vpc-block-5iops-tier"
}

++ pwd -P
+ PWD=/home/jenkins/workspace/Containers-Volumes/pxbackup-torpedo-e2e/src/github.com/IBM/ibm-csi-common
+ ibmcloud resource service-instance-create my-px-backup px-backup px-backup-enterprise jp-tok -p @/home/jenkins/workspace/Containers-Volumes/pxbackup-torpedo-e2e/src/github.com/IBM/ibm-csi-common/px-install.json
Creating service instance my-px-backup in resource group vpc-e2e-test of account IBM as contsto2@in.ibm.com...
OK
Service instance my-px-backup was created.

Name:             my-px-backup
ID:               crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:ba00898e-7fed-4548-8553-63a9acb86b8e::
GUID:             ba00898e-7fed-4548-8553-63a9acb86b8e
Location:         jp-tok
State:            provisioning
Type:             service_instance
Sub Type:         
Allow Cleanup:    false
Locked:           false
Created at:       2024-02-26T08:58:40Z
Updated at:       2024-02-26T08:58:41Z
Last Operation:             
                  Status    create in progress
                  Message   Started create instance operation

+ kubectl get nodes
+ awk 'NR==2{print $1}'
+ xargs -I '{}' kubectl label node '{}' px/enabled=false
node/10.244.0.107 labeled
+ sleep 90
++ kubectl describe deployment px-backup -n px-torpedo-ns
++ grep Image:
+ version='    Image:      icr.io/ext/portworx/px-backup:2.6.0'
ambiknai commented 4 months ago

The error we see is

ibmcloud resource service-instance my-px-backup
Retrieving service instance my-px-backup in resource group vpc-e2e-test under account IBM as contsto2@in.ibm.com...
OK

Name:                  my-px-backup
ID:                    crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:64804d75-5408-4c08-8cd0-c7a585248785::
GUID:                  64804d75-5408-4c08-8cd0-c7a585248785
Location:              jp-tok
Service Name:          px-backup
Service Plan Name:     px-backup-enterprise
Resource Group Name:   vpc-e2e-test
State:                 failed
Type:                  service_instance
Sub Type:              
Locked:                false
Created at:            2024-03-04T12:18:06Z
Created by:            contsto2@in.ibm.com
Updated at:            2024-03-04T12:18:31Z
Last Operation:                  
                       Status    create failed
                       Message   Cannot provision px-backup: Message: Cannot get cluster configuration, ID: crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:64804d75-5408-4c08-8cd0-c7a585248785::, Err: Provision [ crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:64804d75-5408-4c08-8cd0-c7a585248785:: ] : Cannot get cloud configuration: Err: Request failed with status code: 500, internal_error: , details: 
                                    ClusterName: vpc-jp-tok-mzc5mty4, Region: jp-tok, OrgID: , SpaceID , AccountID: e242f140687cd68a8e037b26680e0f04
                                    ServiceID: 3eb2d5a0-24ff-11eb-9437-631d7272fff5, PlanID: 8026f9e7-f908-404f-b2e6-8f61b197eb0f, ResourceGroup: crn:v1:bluemix:public:resource-controller::a/e242f140687cd68a8e037b26680e0f04::resource-group:1938876b2c8941e89a93a26e6103ef6b, ServiceName: px-backup, InstanceID: crn:v1:bluemix:public:px-backup:jp-tok:a/e242f140687cd68a8e037b26680e0f04:64804d75-5408-4c08-8cd0-c7a585248785::

@kshithijiyer-px could you check this

kshithijiyer-px commented 4 months ago

Update from the webex call: Backup is not installed, namespace itself is not created. From the error message it looks like the catalog is not able fetch the clusters this could be due to the API key which we need to pass in the IBM Catalog for px-backup to install px-backup is having some issue.

AI: @ambiknai to check the API key and try to run the job with a new API key.