Closed sysarch-repo closed 3 days ago
Good point. However identifying resources deployed by a helm chart is probably non-trivial task. It needs to be analyzed/explored if/how this would be possible.
@martin-mat, Below are additional insights that point to an issue with the label filtering on StatefulSets.
I am deploying two objects as AUT:
There are also other Deployments / StatefulSets deployed in the same namespace (without those, the AUT would not become ready and work properly).
For the Deployment, the test correctly logs: INFO -- cnf-testsuite-specialized_init_system: Pod count for resource Deployment/release-dns-sig in ns: 1
But for the StatefulSet,the test incorrectly determines: INFO -- cnf-testsuite-specialized_init_system: Pod count for resource StatefulSet/release-dns-prov in ns: 8
So instead of identifying one replica of StatefulSet/release-dns-prov the test checks 8 objects including itself, the already checked object Deployment/release-dns-sig, and 6 other objects in the namespace.
I have updated the title and the bug description accordingly and am staying tuned for any follow-up thoughts or support.
@martin-mat, is there anything else I can add to the ticket to move it forward? Currently, the issue is extremely extending the duration of the tests and I also believe that the sig_term_handled test fails because of the multiple involvement of the same pod in the test. Thanks a lot for your support.
@sysarch-repo sure you can; this is open community project, you are free and welcome to figure out how to design a fix and and go ahead with implementation. If you are not into digging into the code, you need to wait until someone has some time to work on it. The fix of this does not look to be trivial and needs some time. Meanwhile you can think about a workaround - depending on your use case, for example splitting to 2 different namespaces.
Other than that that: can you get in contact with me via LFN slack? https://github.com/cnti-testcatalog/testsuite?tab=readme-ov-file#communication-and-community-meetings
the issue might point here https://github.com/cnf-testsuite/kubectl_client/blob/v1.0.6/kubectl_client.cr#L1310
@martin-mat thanks a lot. Very helpful and much appreciated. IMO the simple fix would be adding an if statement for statefulset to use the labels in the pod selector - exactly as done for the deployment type.
def self.resource_spec_labels(kind : String, resource_name : String, namespace : String | Nil = nil) : JSON::Any
Log.debug { "resource_labels kind: #{kind} resource_name: #{resource_name}" }
if kind.downcase == "service"
resp = resource(kind, resource_name, namespace: namespace).dig?("spec", "selector")
elsif kind.downcase == "deployment"
resp = resource(kind, resource_name, namespace: namespace).dig?("spec", "selector", "matchLabels")
elsif kind.downcase == "statefulset"
resp = resource(kind, resource_name, namespace: namespace).dig?("spec", "selector", "matchLabels")
else
resp = resource(kind, resource_name, namespace: namespace).dig?("spec", "template", "metadata", "labels")
end
Log.debug { "resource_labels: #{resp}" }
if resp
resp
else
JSON.parse(%({}))
end
end
I am not skilled wrt git merges (yet) and crystal but I am happy to join the team on Slack. Thanks again!
@sysarch-repo can you please test this patch on your cnf? https://github.com/cnf-testsuite/kubectl_client/pull/14
@martin-mat I can only test by downloading a new CNTI testsuite tar gz, i.e. here is how I install it on AWS EC2:
curl -sLO "https://github.com/cnti-testcatalog/testsuite/releases/download/v1.2.0/cnf-testsuite-v1.2.0.tar.gz"
I assume for the test you ask I need to master a different procedure but I have no skills / experience with that. Thoughts?
Thoughts?
joining the lfn slack for a quicker support?
The issue is addressed by cnf-testsuite/kubectl_client#14 and #2085 and released as a part of v1.3.0.
Describe the bug If there are multiple pods in the namespace of the AUT, tests like specialized_init_system or sig_term_handled are triggered multiple times for the same pod or the tests are triggered for pods of other objects that should not be applied the test. The problem seems to be with the label filtering on StatefulSets. The label filtering on Deployments works as expected.
To Reproduce Steps to reproduce the behavior:
$ cnf-testsuite version CNF TestSuite version: v1.2.0
StatefulSet release-dns-prov: selector.matchLabels.app/pod-group: release-dns-prov template.metadata.labels.l1: v4 template.metadata.labels.l2: v5 template.metadata.labels.app/pod-group: release-dns-prov template.metadata.labels.l3: v3
Deployment release-dns-sig (seems to be using selector.matchLabels in label filtering): INFO -- cnf-testsuite: resource kind: Deployment INFO -- cnf-testsuite: pods_by_resource name: release-dns-sig DEBUG -- cnf-testsuite: resource_labels kind: Deployment resource_name: release-dns-sig DEBUG -- cnf-testsuite: resource_labels: {"app/pod-group" => "release-dns-sig"} INFO -- cnf-testsuite: pods_by_resource labels: {"app/pod-group" => "release-dns-sig"} DEBUG -- cnf-testsuite: pods_by_label labels: {"app/pod-group" => "release-dns-sig"} ... pod selection INFO -- cnf-testsuite-specialized_init_system: Pod count for resource Deployment/release-dns-sig in ns: 1 <----- OK!
StatefulSet release-dns-prov (seems to be using template.metadata.labels in label filtering): INFO -- cnf-testsuite: resource kind: StatefulSet INFO -- cnf-testsuite: pods_by_resource name: release-dns-prov DEBUG -- cnf-testsuite: resource_labels kind: StatefulSet resource_name: release-dns-prov DEBUG -- cnf-testsuite: resource_labels: {"l1" => "v4", "l2" => "v5", "app/pod-group" => "release-dns-prov", "l3" => "v3"} INFO -- cnf-testsuite: pods_by_resource labels: {"l1" => "v4", "l2" => "v5", "app/pod-group" => "release-dns-prov", "l3" => "v3"} DEBUG -- cnf-testsuite: pods_by_label labels: {"l1" => "v4", "l2" => "v5", "app/pod-group" => "release-dns-prov", "l3" => "v3"} ... pod selection INFO -- cnf-testsuite-specialized_init_system: Pod count for resource StatefulSet/release-dns-prov in ns1: 2 <---- WRONG