cnti-testcatalog / testsuite

📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
173 stars 71 forks source link

Node drain test not starting due to unable to get chaos resources (ChaosExperiment.litmuschaos.io "node-drain" not found) #2022

Closed sysarch-repo closed 3 months ago

sysarch-repo commented 4 months ago

Node drain test not starting due to chaos resources not found

Steps to reproduce Steps to reproduce the behavior: $ cnf-testsuite version CNF TestSuite version: v1.2.0

$ cnf-testsuite node_drain 🎬 Testing: [node_drain] < not progressing>

$ kubectl get nodes

NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-116-213.ec2.internal   Ready                      <none>   39m   v1.28.8-eks-ae9a62a
ip-10-0-76-157.ec2.internal    Ready,SchedulingDisabled   <none>   39m   v1.28.8-eks-ae9a62a

$ kubectl describe chaosengine -n cnti dns-dserver-1c6b0d96

Name:         dns-dserver-1c6b0d96
Namespace:    cnti
Labels:       <none>
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosEngine
Metadata:
  Creation Timestamp:  2024-05-11T19:52:17Z
  Finalizers:
    chaosengine.litmuschaos.io/finalizer
  Generation:        2
  Resource Version:  4544
  UID:               e4bd93c5-290f-48f2-ae0f-d968f55b0590
Spec:
  Appinfo:
    Appkind:              deployment
    Applabel:             app.nti/pod-group=dns-dserver
    Appns:                cnti
  Chaos Service Account:  node-drain-sa
  Components:
    Runner:
      Resources:
  Engine State:  active
  Experiments:
    Name:  node-drain
    Spec:
      Components:
        Env:
          Name:   TOTAL_CHAOS_DURATION
          Value:  90
          Name:   TARGET_NODE
          Value:  ip-10-0-76-157.ec2.internal
        Resources:
        Status Check Timeouts:
  Job Clean Up Policy:  delete
Status:
  Engine Status:  initialized
  Experiments:    <nil>
Events:
  Type     Reason                         Age                  From            Message
  ----     ------                         ----                 ----            -------
  Normal   ChaosEngineInitialized         23m                  chaos-operator  Identifying app under test & launching dns-dserver-1c6b0d96-runner
  Warning  ChaosResourcesOperationFailed  116s (x19 over 23m)  chaos-operator  (chaos start) Unable to get chaos resources

Chaos operator logs:

2024-05-11T20:03:22.491Z    ERROR   controller.chaosengine  Reconciler error    {"reconciler group": "litmuschaos.io", "reconciler kind": "ChaosEngine", "name": "dns-dserver-1c6b0d96", "namespace": "cnti", "error": "ChaosExperiment.litmuschaos.io \"node-drain\" not found"}

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:227

2024-05-11T20:03:22.491Z    DEBUG   events  Warning {"object": {"kind":"ChaosEngine","namespace":"cnti","name":"dns-dserver-1c6b0d96","uid":"e4bd93c5-290f-48f2-ae0f-d968f55b0590","apiVersion":"litmuschaos.io/v1alpha1","resourceVersion":"4544"}, "reason": "ChaosResourcesOperationFailed", "message": "(chaos start) Unable to get chaos resources"}

Expected behavior The expectation is that the AUT runner is started and the test is executed. In cases like this (broken external link), the testsuite shall not run an endless loop and terminate with error instead. Release tests shall be enhanced to maintain high quality of the releases software.

Device (please complete the following information):

$ uname -a Linux ip-10-0-33-96 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux


NOTE: you can enable higher logging level output via the command line or env var. to help with debugging

# cmd line
./cnf-testsuite -l debug test
...

I, [2024-05-11 20:52:08 +00:00 #8052]  INFO -- cnf-testsuite: Cordoned node ip-10-0-76-157.ec2.internal successfully.

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: Workload Node Name: ip-10-0-76-157.ec2.internal

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: Litmus Node Name: ip-10-0-116-213.ec2.internal

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: download_template url, filename: https://raw.githubusercontent.com/litmuschaos/chaos-charts/3.6.0/charts/generic/node-drain/experiment.yaml, node_drain_experiment.yaml

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: chaos_manifests_path

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: filepath: /home/ubuntu/.cnf-testsuite/tools/chaos-experiments/node_drain_experiment.yaml

$ cat /home/ubuntu/.cnf-testsuite/tools/chaos-experiments/node_drain_experiment.yaml 404: Not Found --> The URL https://raw.githubusercontent.com/litmuschaos/chaos-charts/3.6.0/charts/generic/node-drain/experiment.yaml does not exit

kosstennbl commented 4 months ago

It seems that during this commit, node drain was forgotten. I'll prepare a quick PR

martin-mat commented 4 months ago

@kosstennbl @HashNuke @agentpoyo any idea why the issue was not detected in spec tests during github actions?

lixuna commented 4 months ago

@martin-mat please create a new bug issue for the node drain spec test

daniel-wilmes commented 4 months ago

Results of running with compiled branch:

14:31:27.784 [pool-73-thread-1] INFO com.matrixx.konstruxx.tools.cnftestsuite.CnfTestsuiteResultParser - Test Score: 100 14:31:27.787 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:16:23.603-0400 (15m) Executed Command #1 in testCHF: CNF TestSuite Creating cnf-testsuite.yml Getting chart directory Running cnf-testsuite setup Running cnf-testsuite cnf_setup Running cnf-testsuite node_drain Parsing Results Score Breakdown - default Test Name,Received Points,Max Points,Status,Category node_drain,100,100,passed,essential

Score Summary - default Summary - default

Total Essential Tests Passed: 1 Total Essential Tests Failed: 0 Percentage for Essential: 100.0

Total Bonus Tests Passed: 0 Total Bonus Tests Failed: 0

Total Normal Tests Passed: 0 Total Normal Tests Failed: 0

14:31:27.787 [pool-73-thread-1] INFO com.matrixx.konstruxx.Konstruxx - Performing Captures... 14:31:27.832 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:31:27.787-0400 (45ms) Completed Capture for testCHF in default 14:31:28.050 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:31:27.832-0400 (218ms) Completed Capture for testCHF in cnf-testsuite capturing info for cluster-tools-dhg57 capturing info for cluster-tools-whbhz

14:31:28.050 [pool-73-thread-1] INFO com.matrixx.konstruxx.Konstruxx - Blueprint created