SovereignCloudStack / standards

SCS standards in a machine readable format
https://scs.community/
Creative Commons Attribution Share Alike 4.0 International
32 stars 21 forks source link

[Standard] CNCF conformance tests #709

Open tonifinger opened 4 months ago

tonifinger commented 4 months ago

This issue was created to provide a discussion ground for possible future standards. It is derived from SovereignCloudStack/issues#181 and one of the points not assigned any issue yet.

As a cloud service provider, I want my KaaS to comply with the Certified Kubernetes Software Conformance Here we discuss how the "Certified Kubernetes Software Conformance" can be utilized.

Definition of Done:

tonifinger commented 4 months ago

Reasearch:

Following the short guide, one can achieve CNCF certification by passing certain tests and submitting the test results to the CNCF Conformance Repo on GitHub (https://github.com/cncf/k8s-conformance, Requierments: Certified_Kubernetes_Terms.md)

The tests themselves can be carried out according to these instructions: https://github.com/cncf/k8s-conformance/blob/master/instructions.md .

Following these instructions, the tests are executed by sonobuoy, which furthermore executes the kubernetes e2e tests.

The SCS itself already uses these tests and tools within the k8s-cluster-api-provider The implementation of the tests is therefore already done when using the k8s-cluster-api-provider. In addition, I would suggest making it standard for the SCS KaaS to successfully pass the CNCF conformance tests.

List of certified products:

The CNCF provides a list of products that have earned a "certified kubernetes status" ( see: https://www.cncf.io/training/certification/software-conformance/#logos)

This list is divided into the following sections:

Perhaps the SCS could achieve to be listed in any of the products categories? Otherwise, we could write a decision dataset for users on what they need to do to get their SCS- deployed KaaS infrastructure certified by the CNCF.


SCS Implementation(draft):

Useful links:

mbuechse commented 4 months ago

@tonifinger See v2 here: https://docs.scs.community/standards/scs-compatible-kaas The question is: how do we transfer the results of the e2e pipeline into the compliance check tool?

tonifinger commented 3 months ago

@tonifinger See v2 here: https://docs.scs.community/standards/scs-compatible-kaas The question is: how do we transfer the results of the e2e pipeline into the compliance check tool?

To transfer the results of job/k8s-cluster-api-provider-e2e-conformance to the Python script scs-compliance-check.py we need to accomplish the following actions:

As a first step, the result of the conformance check is generated by the following “Zuul Job”: k8s-cluster-api-provider-e2e-conformance. ( The job executes the following playbook: playbooks/tasks/sonobouy.yaml )

Hint: The mode used in the Zuul job configuration is conformance (see: .zuul.yaml#L32) However in order to submit a result, according to "How to submit conformance results" we must use the mode certified-conformance

The generated data holding the results must be transferred to the "zuul job" running the "scs-compliance-check.py" script: k8s-cluster-api-provider-scs-compliance-1.27. ( The job executes the following playbook: playbooks/tasks/scs_compliance.yaml ) To accomplish this, we could use “zuul return-values”.

Finally, to check the results, the test script must be extended to analyze the test results generated by sonobuoy.

Hint: This goes hand in hand with the use of sonobouy as a test tool for the "kaas scs compliance tests": see kaas-sonobuoy-go-example-e2e-framework . We could use the same mechanism twice: First to analyze the "e2e conformance" test results Second to analyze the "kaas scs compliance tests" results


Question: Do we just want to test whether we would archive a certificate of compliance or do we also want to provide a mechanism to handle the process of archiving a certificate?

mbuechse commented 1 month ago

@tonifinger I don't understand the question at the end, but I will try to explain what is desired here to the best of my abilities.

For every certificate scope (such as "SCS-compatible KaaS"), we specify which tests have to be passed in order for the certificate to be awarded. This process is documented in scs-0003-v1.

A straightforward approach would be to write a Python script that "just" runs the CNCF tests, waits until they are finished, and then outputs the result (either "PASS" or "FAIL") -- this script could be included into our certificate scope just like every other test script that we are using.

The downside of this straightforward approach is the following: the CNCF tests are really quite time-consuming. In fact, taking at least 2 hrs, they dwarf all the other tests we do for our certificate scope (and this might be the case even for tests that we might add in the future).

I see three options of dealing with this downside:

  1. Just live with it.
  2. Kind of like 1, but declare the lifetime of the test result for the CNCF test to be a whole week, and make sure that this test is only run weekly. The SCS infrastructure does provide an option for that.
  3. Have the CNCF tests run independently of the SCS tests. Then add a "dummy" test to the SCS suite that merely collects the test result of the CNFC tests somehow.

I think you are referring to this third option. It does have the distinct advantage that it would probably be easiest to implement with the SCS ClusterStacks implementation. AFAICT the e2e playbook is quite involved with creating a cluster etc., and this is something I wouldn't want to replicate (as would be necessary for Options 1 and 2).

We might be able to add a task to https://github.com/SovereignCloudStack/k8s-cluster-api-provider/blob/main/playbooks/tasks/sonobouy.yaml that takes the parsed results, writes them to a dummy file sono-result, and then executes scs-compliance-check.py with a certain parameter so that it would only run the above-mentioned dummy test (-t cncf-k8s-conformance, see here), which in turn would read the dummy file sono-result and report the result in the correct way. Then we could upload the results yaml (generated by scs-complience-check.py) the same way we do it in this playbook -- lo and behold, the result would be registered with the SCS compliance monitor, independently of all the other compliance tests.

There is such a big BUT there. The whole e2e business only works with the SCS reference implementation. So for any partner who uses something else, say, Gardener, we need some other way of running the CNFC tests, recording the results, and then upload them to the compliance monitor... I think we urgently need to get our hands on more varied environments, see https://github.com/SovereignCloudStack/standards/issues/649 -- I'm afraid this has been stagnant for quite a while. Ideally, we could have an abstraction that allowed us for creating a cluster and running sonobouy on it, and then create concrete instances, one for ClusterStacks, one for Gardener, and so on... This would be an issue of its own, and maybe we should wait for that? This is definitely something that ought to be discussed in Team Container. Specifically: What would be a good way to create a K8s cluster that works on any partner cloud, regardless of the implementation they use?

tonifinger commented 3 weeks ago

DRAFT: @mbuechse Thank you for your detailed answer!

First I would like to explain my above question in more detail. I was referring to the final process of obtaining/announcing a certificate. AFAIU, this can only be archived if you submit your results to the corresponding CNCF repository via a PR.

see: https://www.cncf.io/training/certification/software-conformance/#how For example gardener uses a bot that automatically creates PR for the CNFC k8s-conformance repository see: https://github.com/cncf/k8s-conformance/pull/3315

So it should be possible to automatically create a PR to announce a certification. My question is whether we also want to include this process in our pipeline or whether we want to leave the final process of submitting the results to the CNCF repository to the end user of the SCS?


Secondly, about the different approaches. Thanks for the evaluation of the different ways to approach this problem! I'm currently figuring out how to reconcile this with https://github.com/SovereignCloudStack/standards/issues/649. I will write a more detailed response once I have a clear overview of this topic.

mbuechse commented 3 weeks ago

@tonifinger After today's meeting of SIG Std/Cert, I can tell you the following.

  1. We run the tests, but we don't archive the results on the k8s-conformance repo.
  2. Providers could show us their archived results if they have any, but even then we would probably re-check at least some of the times.
  3. We might include some kind of special case for the e2e pipeline so that it doesn't run the tests twice (once stand alone and once again as part of the SCS conformance tests). But that special case is not a priority.
  4. We can provision a cluster using a plugin for the test.
    • one plugin could just use a static cluster (with some provided Kubeconfig)
    • one plugin could use the Cluster API to provision and tear down a cluster
    • one plugin could use the "central API" to provision and tear down a cluster
    • finally, our partners could supply us with a plugin of their own, e.g., for Gardener

We still have to figure out the details of this plugin approach.

mbuechse commented 3 weeks ago

@tonifinger I created https://github.com/SovereignCloudStack/standards/issues/710 for the plugin topic.