cnti-testcatalog / testsuite

📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
173 stars 71 forks source link

[BUG] cnf_setup passes even when cluster_tools pods are not started successfully #1975

Closed martin-mat closed 5 months ago

martin-mat commented 5 months ago

cnf_setup task does not check result of cluster_tools creation and even if it fails, the task passes.

https://github.com/cnti-testcatalog/testsuite/blob/main/src/tasks/cnf_setup.cr#L35 https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L58 https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L236 https://github.com/cnf-testsuite/kubectl_client/blob/main/kubectl_client.cr#L1068

cluster_tools need to be fixed.

taylor commented 5 months ago

@martin-mat are you seeing this fail?

https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L236 is called for the install method (with the wait_for_cluster_tools call).

In the kubectl client we have this

      until is_ready || second_count > wait_count
        Log.info { "KubectlClient::Get.resource_wait_for_install attempt: #{second_count}; is_ready: #{is_ready}" }
        sleep 1
        is_ready = resource_ready?(kind, namespace, resource_name, kubeconfig)
        second_count = second_count + 1
      end

It will fail after 180 seconds otherwise we move to

return is_ready

https://github.com/cnf-testsuite/kubectl_client/blob/main/kubectl_client.cr#L1060

taylor commented 5 months ago

@martin-mat

A if/else can be added to https://github.com/cnti-testcatalog/testsuite/blob/main/src/tasks/cnf_setup.cr#L35 where the install is called. The install command will return a boolean (which is from the is_ready deeper down).

The caution we have is the setup can take longer than the current 180 seconds in the code and by the time the tests run it is normally ready to go. So we could have false negatives unless we adjust the timeout here: https://github.com/cnf-testsuite/kubectl_client/blob/main/kubectl_client.cr#L1048

A larger delay can be added in ClusterTools as an additional argument to the resource_wait_for_install call: https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L236

Example:

    KubectlClient::Get.resource_wait_for_install("Daemonset", "cluster-tools", namespace: self.namespace!, wait_count: 300)
martin-mat commented 5 months ago

Yes, I can see the issue. The 180s timeout probably needs to be increased substantially and report appropriately with an error message if the timeout is expired.

To indicate unsuccessful cluster_tools pod create, return value needs to be propagated/checked on those places in the code: https://github.com/cnti-testcatalog/testsuite/blob/main/src/tasks/cnf_setup.cr#L35 https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L58 https://github.com/cnf-testsuite/cluster_tools/blob/main/cluster_tools.cr#L236