cnti-testcatalog / testsuite

📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
171 stars 70 forks source link

[BUG] Stateful Set charts fail on rollback and rollback tests #1726

Closed daniel-wilmes closed 1 year ago

daniel-wilmes commented 1 year ago

Describe the bug We have a stateless helm chart that is failing on the rollback test. Based on research helm does not support stateful set charts with rollback.

To Reproduce

  1. Create chart that has a stateless set.
  2. Run the test-suite with - rollback enabled.
  3. Error: rollback failed no deployment found with stateless set

Expected behavior

  1. Skip the stateless set chart

Device (please complete the following information):

HashNuke commented 1 year ago

Found an issue on the helm repo that mentions about rollback/upgrade not working for stateful sets.

https://github.com/helm/helm/issues/8386#issuecomment-808428378

daniel-wilmes commented 1 year ago

@HashNuke when we look through the logs we see this:

I, [2023-01-26 01:53:37 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status command: kubectl rollout status deployment/snmp-exporter-ag1 --timeout=180s -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.undo command: kubectl rollout undo deployment/snmp-exporter-ag1 -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: activemq-1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: mtx-engine-opr-controller-manager-metrics-service namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: event-streamer-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: gateway-proxy-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: matrixxbct-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: notifier-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: payment-service-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: rsgateway-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: snmp-exporter-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: tra-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: StatefulSet resource_name: tra-dr-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Get.resource command: kubectl get StatefulSet tra-dr-ag1 -o json -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: kubectl get resource volumes: [{"emptyDir" => {"medium" => "Memory", "sizeLimit" => "1Gi"}, "name" => "dshm"}, {"emptyDir" => {}, "name" => "sideloader-sync-dir"}, {"name" => "shared-coredump-storage", "persistentVolumeClaim" => {"claimName" => "shared-coredump-storage"}}, {"name" => "shared-logging-storage", "persistentVolumeClaim" => {"claimName" => "shared-logging-storage"}}, {"configMap" => {"defaultMode" => 420, "name" => "tra-config-ag1"}, "name" => "tra-config"}, {"configMap" => {"defaultMode" => 420, "name" => "topology-config"}, "name" => "topology-config"}, {"downwardAPI" => {"defaultMode" => 420, "items" => [{"fieldRef" => {"apiVersion" => "v1", "fieldPath" => "metadata.name"}, "path" => "name"}]}, "name" => "podinfo"}] I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Get.resource command: kubectl get StatefulSet tra-dr-ag1 -o json -n matrixx Please add the container name tra-1 and a corresponding rollback_from_tag into your cnf-testsuite.yml under container names I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: rollback version change successful? false I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status command: kubectl rollout status deployment/tra-dr-ag1 --timeout=180s -n matrixx Rollback failed on resource: tra-dr-ag1 and container: tra-1 I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status stderr: Error from server (NotFound): deployments.apps "tra-dr-ag1" not found I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.undo command: kubectl rollout undo deployment/tra-dr-ag1 -n matrixx ✖️ FAILED: CNF Rollback Failed

Because the stateful set do not have a deployment the query fails to execut the rollback (at least that is what we believe).

kubectl rollout status deployment/tra-dr-ag1 --timeout=180s -n matrixx

There is nothing inside of deployment with tra.

HashNuke commented 1 year ago

@daniel-wilmes Thank you for sharing this. I'm making some changes to skip this test if the CNF has any statefulsets.

daniel-wilmes commented 1 year ago

@HashNuke will you skip the test on only the pods with stateful sets or the hole test once you detect there are stateful set pods?

HashNuke commented 1 year ago

@daniel-wilmes Thank you for sharing the logs. That helped identify one of the issues with statefulsets. I discussed with the team about the rollback test and other rolling_* tests.

1. Issue with image rollouts for statefulsets

2. Testing rollbacks for statefulsets that have database containers

If your statefulset does use a database, please do let us know which database you so that we can proceed with checking for the database image and skipping those containers from being tested.

HashNuke commented 1 year ago

Feedback from discussion on the CNF TestSuite Office Hours call on 07-Feb-2023:

@taylor: Use an allow list for skipping tests for containers in statefulsets. This will help test any CNF that uses databases that are proprietary or any other database that is unsupported by the checks we run.

HashNuke commented 1 year ago

The changes for this test are in this PR - https://github.com/cncf/cnf-testsuite/pull/1737 (branch bug/1726). Requires running shards update kubectl_client and compiling the testsuite again.

agentpoyo commented 1 year ago

Acceptance Criteria

agentpoyo commented 1 year ago

1726_output

lixuna commented 1 year ago

@HashNuke @agentpoyo what was the level of effort to resolve this issue (0,1,2,3,5,8)? Thank you

agentpoyo commented 1 year ago

3pts for me

daniel-wilmes commented 1 year ago

@HashNuke @agentpoyo we ran the pre-release binary against our charts and these all passed for us:

Getting chart directory Running cnf-testsuite setup Running cnf-testsuite cnf_setup Running cnf-testsuite rolling_update Running cnf-testsuite rolling_downgrade Running cnf-testsuite rolling_version_change Running cnf-testsuite rollback Parsing Results Score Breakdown - engineering-target-high Test Name,Received Points,Max Points,Status,Category rollback,5,5,passed,normal rolling_update,5,5,passed,null rolling_downgrade,5,5,passed,null rolling_version_change,5,5,passed,null Score Summary - engineering-target-high Summary - engineering-target-high (edited)

When will the final release be available?

agentpoyo commented 1 year ago

Hello @daniel-wilmes

The v0.41.0 was just published and should address this bug in the release.

https://github.com/cncf/cnf-testsuite/releases/tag/v0.41.0