cnti-testcatalog / testsuite

📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
169 stars 70 forks source link

node-drain: Fix link to experiment #2024

Closed kosstennbl closed 1 month ago

kosstennbl commented 1 month ago

In 69dedb30dd686ab2d1edffb4178ddb3f2c94e7e4, Litmus version was updated And all links for experiments were changed accordingly except for the node_drain. This commit fixes that.

ref: #2022

taylor commented 1 month ago

Lgtm

martin-mat commented 1 month ago

lgtm

martin-mat commented 1 month ago

there is a good spec test for node_drain https://github.com/cnti-testcatalog/testsuite/blob/main/spec/workload/resilience/node_drain_spec.cr which behaves correctly (verified).

The real reason why the issue was not detected earlier during github actions is that they use one-node kind setup for testing. node_drain test needs multi-node setup and the spec tests "passes" because the test is "skipped". Example: https://github.com/cnti-testcatalog/testsuite/actions/runs/9025631332/job/24802869636

⏭️ 🏆SKIPPED: [node_drain] node_drain chaos test requires the cluster to have atleast two schedulable nodes 🗡️💀♻

The spec tests is happy with such skipping:

      if KubectlClient::Get.schedulable_nodes_list.size > 1
        (/(PASSED).*(node_drain chaos test passed)/ =~ result[:output]).should_not be_nil
      else
        (/(SKIPPED).*(node_drain chaos test requires the cluster to have atleast two)/ =~ result[:output]).should_not be_nil
      end

So I propose to adapt github actions so they run on kind with 2 schedulable nodes Since it is more generic adaptation I suggest to handle this in a separate ticket.

2026

daniel-wilmes commented 1 month ago

@martin-mat I have verified that the fix for node drain works as a single test and in the cert command. However, I will say the cert command does not ever finish, which appears to be a seperate issue from node drain. It may pertain to either sig_term_handled, zombie_handled, or specialized_init_system. The logging doesn't appear to indicate where we are stuck. But for this ticket I think the fix for node_drain should go in.

`--- name: cnf testsuite testsuite_version: node-drain-fix-2024-05-15-142132-3258c691 status: command: /home/dwilmes/.mtx/konstruxx/working/tests/testCHF/cnf-testsuite cert essential points: 100 exit_code: 0 items:

taylor commented 1 month ago

@daniel-wilmes please open a new issue for

However, I will say the cert command does not ever finish, which appears to be a seperate issue from node drain. It may pertain to either sig_term_handled, zombie_handled, or specialized_init_system. The logging doesn't appear to indicate where we are stuck.

You can try disabling those with ~testname (eg. ~sig_term_handled) and isolate which test needs to be investigated. Add that info to the new ticket you create.

cc: @Smitholi67

taylor commented 1 month ago

@martin-mat good catch on the kind in github actions. https://github.com/cnti-testcatalog/testsuite/pull/2024#issuecomment-2112309884