cnti-testcatalog / testsuite

šŸ“žšŸ“±ā˜ŽļøšŸ“”šŸŒ Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
174 stars 72 forks source link

[BUG] pod_memory_hog spec test failing in github actions when PR merged to `main` #1973

Closed agentpoyo closed 2 months ago

agentpoyo commented 7 months ago

Describe the bug pod_memory_hog continues to fail in github actions:

I, [2024-04-15 16:45:39 +00:00 #10118]  INFO -- cnf-testsuite: upsert_task: task: pod_memory_hog has status: failed and is awarded: 0 points. Runtime: 96 seconds
I, [2024-04-15 16:45:39 +00:00 #10118]  INFO -- cnf-testsuite: results yaml: {"name" => "cnf testsuite", "testsuite_version" => "main-2024-04-15-164445-5a8da5dc", "status" => nil, "command" => "/home/runner/work/testsuite/testsuite/cnf-testsuite pod_memory_hog verbose", "points" => nil, "exit_code" => 0, "items" => [{"name" => "pod_memory_hog", "status" => "failed", "type" => "normal", "points" => 0}]}

  'pod_memory_hog' A 'Good' CNF should not crash when pod memory hog occurs

Failures:

  1) Resilience pod memory hog Chaos 'pod_memory_hog' A 'Good' CNF should not crash when pod memory hog occurs
     Failure/Error: (/PASSED: pod_memory_hog chaos test passed/ =~ response_s).should_not be_nil

       Expected: nil not to be nil

Error:      # spec/workload/resilience/pod_memory_hog_spec.cr:22

Finished in 2:21 minutes
1 examples, 1 failures, 0 errors, 0 pending

Failed examples:

crystal spec spec/workload/resilience/pod_memory_hog_spec.cr:15 # Resilience pod memory hog Chaos 'pod_memory_hog' A 'Good' CNF should not crash when pod memory hog occurs
Error: Process completed with exit code 1.

To Reproduce Merges to main reproduce the errors.

Expected behavior Should pass and not fail.

taylor commented 7 months ago

There is a Litmus Chaos error happening. Before trying to debug we are upgrading Litmus which requires some updates in code integration to the tools and litmus test cases.

This issue is blocking #1949

taylor commented 7 months ago

Need a new issue to handle constraints in pods, dynamically reading the constraint and telling Litmus to not use more than about 95% of the constraint.

If it uses 100% it will cause kubernetes to kill the pod which causes a failure. That is a different type of test.

taylor commented 7 months ago

Spec bug can be fixed with hardcoding to lower than 80% of coredns. This is not ideal though

kosstennbl commented 2 months ago

Issue seems to be fixed with #1987. image Continuation and further improvement of "pod_memory_hog" test is described in #1989