Closed daniel-wilmes closed 1 year ago
Found an issue on the helm repo that mentions about rollback/upgrade not working for stateful sets.
https://github.com/helm/helm/issues/8386#issuecomment-808428378
@HashNuke when we look through the logs we see this:
I, [2023-01-26 01:53:37 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status command: kubectl rollout status deployment/snmp-exporter-ag1 --timeout=180s -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.undo command: kubectl rollout undo deployment/snmp-exporter-ag1 -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: activemq-1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: mtx-engine-opr-controller-manager-metrics-service namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: event-streamer-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: gateway-proxy-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: matrixxbct-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: notifier-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: payment-service-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: rsgateway-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: snmp-exporter-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: Service resource_name: tra-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite-KubectlClient::Get.resource_volumes: StatefulSet resource_name: tra-dr-ag1 namespace: matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Get.resource command: kubectl get StatefulSet tra-dr-ag1 -o json -n matrixx I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: kubectl get resource volumes: [{"emptyDir" => {"medium" => "Memory", "sizeLimit" => "1Gi"}, "name" => "dshm"}, {"emptyDir" => {}, "name" => "sideloader-sync-dir"}, {"name" => "shared-coredump-storage", "persistentVolumeClaim" => {"claimName" => "shared-coredump-storage"}}, {"name" => "shared-logging-storage", "persistentVolumeClaim" => {"claimName" => "shared-logging-storage"}}, {"configMap" => {"defaultMode" => 420, "name" => "tra-config-ag1"}, "name" => "tra-config"}, {"configMap" => {"defaultMode" => 420, "name" => "topology-config"}, "name" => "topology-config"}, {"downwardAPI" => {"defaultMode" => 420, "items" => [{"fieldRef" => {"apiVersion" => "v1", "fieldPath" => "metadata.name"}, "path" => "name"}]}, "name" => "podinfo"}] I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Get.resource command: kubectl get StatefulSet tra-dr-ag1 -o json -n matrixx Please add the container name tra-1 and a corresponding rollback_from_tag into your cnf-testsuite.yml under container names I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: rollback version change successful? false I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status command: kubectl rollout status deployment/tra-dr-ag1 --timeout=180s -n matrixx Rollback failed on resource: tra-dr-ag1 and container: tra-1 I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.status stderr: Error from server (NotFound): deployments.apps "tra-dr-ag1" not found I, [2023-01-26 01:53:38 +00:00 #644298] INFO -- cnf-testsuite: KubectlClient::Rollout.undo command: kubectl rollout undo deployment/tra-dr-ag1 -n matrixx ✖️ FAILED: CNF Rollback Failed
Because the stateful set do not have a deployment the query fails to execut the rollback (at least that is what we believe).
kubectl rollout status deployment/tra-dr-ag1 --timeout=180s -n matrixx
There is nothing inside of deployment with tra.
@daniel-wilmes Thank you for sharing this. I'm making some changes to skip this test if the CNF has any statefulsets.
@HashNuke will you skip the test on only the pods with stateful sets or the hole test once you detect there are stateful set pods?
@daniel-wilmes Thank you for sharing the logs. That helped identify one of the issues with statefulsets.
I discussed with the team about the rollback
test and other rolling_*
tests.
rollback
test and rolling_*
test for statefulsets (atleast in the upcoming changes). If your statefulset does use a database, please do let us know which database you so that we can proceed with checking for the database image and skipping those containers from being tested.
Feedback from discussion on the CNF TestSuite Office Hours call on 07-Feb-2023:
@taylor: Use an allow list for skipping tests for containers in statefulsets. This will help test any CNF that uses databases that are proprietary or any other database that is unsupported by the checks we run.
The changes for this test are in this PR - https://github.com/cncf/cnf-testsuite/pull/1737 (branch bug/1726
).
Requires running shards update kubectl_client
and compiling the testsuite again.
bug/1726
to test and verify before pushing to main
if you are the reviewer of the pull requestshards install
to make sure any modules are up-to-date with latest codesetup
and then cnf_setup
with the CNF to install for the test../cnf-testsuite cnf_setup cnf-path=sample-cnfs/sample-statefulset-nginx
rollback
, rolling_update
, rolling_downgrade
and rolling_version_change
@HashNuke @agentpoyo what was the level of effort to resolve this issue (0,1,2,3,5,8)? Thank you
3pts for me
@HashNuke @agentpoyo we ran the pre-release binary against our charts and these all passed for us:
Getting chart directory Running cnf-testsuite setup Running cnf-testsuite cnf_setup Running cnf-testsuite rolling_update Running cnf-testsuite rolling_downgrade Running cnf-testsuite rolling_version_change Running cnf-testsuite rollback Parsing Results Score Breakdown - engineering-target-high Test Name,Received Points,Max Points,Status,Category rollback,5,5,passed,normal rolling_update,5,5,passed,null rolling_downgrade,5,5,passed,null rolling_version_change,5,5,passed,null Score Summary - engineering-target-high Summary - engineering-target-high (edited)
When will the final release be available?
Hello @daniel-wilmes
The v0.41.0 was just published and should address this bug in the release.
Describe the bug We have a stateless helm chart that is failing on the rollback test. Based on research helm does not support stateful set charts with rollback.
To Reproduce
Expected behavior
Device (please complete the following information):