Review and update test documentation in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md

The documentation below has been removed from the Certification repo in the interest of reducing duplication of efforts. The Test Suite is where we document tests in terms of what is the test covering, what is the rationale, what are the expected results of the test, and any potential remediation steps.

As the documentation below has been removed from the Certification repo, it is recommended to review the text below and incorporate missing information (of value) into the existing test documentation located in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md

Documentation removed from CNTi Certification Repo as of July 2024: List of Workload Tests

Compatibility, Installability, and Upgradability Category

Increase decrease capacity:

Added to CNF Certification in v1.0
Expectation: The number of replicas for a Pod increases, then the number of replicas for a Pod decreases

What's tested: The pod is increased and replicated to 3 for the CNF image or release being tested. After increase_capacity increases the replicas to 3, it decreases back to 1.

The increase and decrease capacity tests: HPA (horizonal pod autoscale) will autoscale replicas to accommodate when there is an increase of CPU, memory or other configured metrics to prevent disruption by allowing more requests by balancing out the utilisation across all of the pods.

Decreasing replicas works the same as increase but rather scale down the number of replicas when the traffic decreases to the number of pods that can handle the requests.

You can read more about horizonal pod autoscaling to create replicas here and in the K8s scaling cheatsheet.

Helm chart published

Added to CNF Certification in v1.0
Expectation: Helm chart is published

What's tested: Checks if a Helm chart is published

Helm chart valid

Added to CNF Certification in v1.0
Expectation: Helm chart is valid

What's tested: This runs helm lint against the helm chart being tested. You can read more about the helm lint command at helm.sh

Helm deploy

Added to CNF Certification in v1.0
Expectation: Helm deploy is successful

What's tested: This checks if the CNF was deployed using Helm

Rollback

Added to CNF Certification in v1.0
Expectation: CNF rollback is successful

What's tested: To check if a CNF version can be rolled back

CNI compatible

Added to CNF Certification in v1.0
Expectation: CNF should be compatible with multiple and different CNIs

What's tested: This installs temporary kind clusters and will test the CNF against both Calico and Cilium CNIs.

Microservice Category

Reasonable image size

Added to CNF Certification in v1.0
Expectation: CNF image size is under 5 gigs

What's tested: Checks the size of the image used.

Reasonable startup time

Added to CNF Certification in v1.0
Expectation: CNF starts up under 30 seconds

What's tested: This counts how many seconds it takes for the CNF to startup.

Single process type in one container

Added to CNF Certification in v1.0
Expectation: CNF container has one process type

What's tested: This verifies that there is only one process type within one container. This does not count against child processes. Example would be nginx or httpd could have a parent process and then 10 child processes but if both nginx and httpd were running, this test would fail.

Service discovery

Added to CNF Certification in v1.0
Expectation: CNFs should not expose their containers as a service

What's tested: This tests and checks if a container for the CNF has services exposed. Application access for microservices within a cluster should be exposed via a Service. Read more about K8s Service here.

Shared database

Added to CNF Certification in v1.0
Expectation: Multiple microservices should not share the same database.

What's tested: This tests if multiple CNFs are using the same database.

SIGTERM Handled

Added to CNTi Certification in v2.0-beta
Expectation: SIGTERM is handled by PID 1 process of containers.
ID: sig_term_handled

What's tested: This tests if the PID 1 process of containers handles SIGTERM.

Specialized Init System

Added to CNTi Certification in v2.0-beta
Expectation: Container images should use specialized init systems for containers.
ID: specialized_init_system

What's tested: This tests if containers in pods have dumb-init, tini or s6-overlay as init processes.

Zombie Handled

Added to CNTi Certification in v2.0-beta
Expectation: Zombie processes are handled/reaped by PID 1 process of containers.
ID: zombie_handled

What's tested: This tests if the PID 1 process of containers handles/reaps zombie processes.

State Category

Node drain

Added to CNF Certification in v1.0
Expectation: A node will be drained and rescheduled onto other available node(s).

What's tested: A node is drained and rescheduled to another node, passing with a liveness and readiness check. This will skip when the cluster only has a single node.

No local volume configuration

Added to CNF Certification in v1.0
Expectation: Local storage should not be used or configured.

What's tested: This tests if local volumes are being used for the CNF.

Elastic volumes

Added to CNF Certification in v1.0
Expectation: Elastic persistent volumes should be configured for statefulness.

What's tested: This checks for elastic persistent volumes in use by the CNF.

Reliability, Resilience and Availability Category

Pod network latency

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when network latency occurs

What's tested: This experiment causes network degradation without the pod being marked unhealthy/unworthy of traffic by kube-proxy (unless you have a liveness probe of sorts that measures latency and restarts/crashes the container). The idea of this experiment is to simulate issues within your pod network OR microservice communication across services in different availability zones/regions etc.

The applications may stall or get corrupted while they wait endlessly for a packet. The experiment limits the impact (blast radius) to only the traffic you want to test by specifying IP addresses or application information. This experiment will help to improve the resilience of your services over time.

Disk fill

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when disk fill occurs

What's tested: Stressing the disk with continuous and heavy IO for example can cause degradation in reads written by other microservices that use this shared disk for example modern storage solutions for Kubernetes to use the concept of storage pools out of which virtual volumes/devices are carved out. Another issue is the amount of scratch space eaten up on a node which leads to the lack of space for newer containers to get scheduled (Kubernetes too gives up by applying an "eviction" taint like "disk-pressure") and causes a wholesale movement of all pods to other nodes. Similarly with CPU chaos, by injecting a rogue process into a target container, we starve the main microservice process (typically PID 1) of the resources allocated to it (where limits are defined) causing slowness in application traffic or in other cases unrestrained use can cause the node to exhaust resources leading to the eviction of all pods. So this category of chaos experiment helps to build the immunity on the application undergoing any such stress scenario.

Pod delete

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when pod delete occurs

What's tested: This experiment helps to simulate such a scenario with forced/graceful pod failure on specific or random replicas of an application resource and checks the deployment sanity (replica availability & uninterrupted service) and recovery workflow of the application.

Memory hog

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when pod memory hog occurs

What's tested: The pod-memory hog experiment launches a stress process within the target container - which can cause either the primary process in the container to be resource constrained in cases where the limits are enforced OR eat up available system memory on the node in cases where the limits are not specified.

IO Stress

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when pod io stress occurs

What's tested: This test stresses the disk with with continuous and heavy IO to cause degradation in reads/ writes by other microservices that use this shared disk.

Network corruption

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when pod network corruption occurs

What's tested: This test uses the LitmusChaos pod_network_corruption experiment.

Network duplication

Added to CNF Certification in v1.0
Expectation: The CNF should continue to function when pod network duplication occurs

What's tested: This test uses the LitmusChaos pod_network_duplication experiment.

Helm chart liveness

Added to CNF Certification in v1.0
Expectation: A liveness probe should be found in the CNF cluster

What's tested: This test checks for livenessProbe in the resource and container

Helm chart readiness

Added to CNF Certification in v1.0
Expectation: A readiness probe should be found in the CNF cluster

What's tested: This test check for readinessProbe in the resource and container

Observability and Diagnostic Category

Use stdout/stderr for logs

Added to CNF Certification in v1.0
Expectation: Resource output logs should be sent to STDOUT/STDERR

What's tested: This checks and verifies that STDOUT/STDERR is configured for logging.

For example, running kubectl get logs returns useful information for diagnosing or troubleshooting issues.

Prometheus installed

Added to CNF Certification in v1.0
Expectation: Prometheus is being used for the cluster and CNF for metrics.

What's tested: Tests for the presence of Prometheus or if the CNF emit prometheus traffic.

Fluentd logs

Added to CNF Certification in v1.0
Expectation: Fluentd is capturing logs.

What's tested: Checks for fluentd presence and if logs are being captured for fluentd.

OpenMetrics compatible

Added to CNF Certification in v1.0
Expectation: CNF should emit OpenMetrics compatible traffic.

What's tested: Checks if OpenMetrics is being used and or compatible.

Jaeger tracing

Added to CNF Certification in v1.0
Expectation: The CNF should use tracing

What's tested: Checks if Jaeger is configured and tracing is being used.

Security Category

Container socket mounts

Added to CNF Certification in v1.0
Expectation: Container engine daemon sockets should not be mounted as volumes

What's tested This test uses the Kyverno policy called Disallow CRI socket mounts

[Sysctls test]

Added to CNF Certification in v1.0
Expectation: TBD

What's tested: TBD

External IPs

Added to CNF Certification in v1.0
Expectation: A CNF should not run services with external IPs

What's tested: Checks if the CNF has services with external IPs configured

Privilege escalation

Added to CNF Certification in v1.0
Expectation: Containers should not allow for privilege escalation

What's tested: TBD Privilege Escalation: Check that the allowPrivilegeEscalation field in securityContext of container is set to false.

See more at ARMO-C0016

Symlink file system

Added to CNF Certification in v1.0
Expectation: No containers allow a symlink attack

What's tested: This control checks the vulnerable versions and the actual usage of the subPath feature in all Pods in the cluster.

See more at ARMO-C0058

Application credentials

Added to CNF Certification in v1.0
Exepectation: Application credentials should not be found in configuration files

What's tested: Check if the pod has sensitive information in environment variables, by using list of known sensitive key names. Check if there are configmaps with sensitive information.

See more at ARMO-C0012

Host network

Added to CNF Certification in v1.0
Expectation: PODs should not have access to the host systems network.

What's tested: Checks if there is a host network attached to a pod. See more at ARMO-C0041

Service account mapping

Added to CNF Certification in v1.0
Expectation: The automatic mapping of service account tokens should be disabled.

What's tested: Check if service accounts are automatically mapped. See more at ARMO-C0034.

Ingress and Egress blocked

Added to CNF Certification in v1.0
Expectation: Ingress and Egress traffic should be blocked on Pods.

What's tested: Checks Ingress and Egress traffic policy

Privileged containers, Kubescape

Added to CNF Certification in v1.0
Expectation: Containers should not allow privilege escalation

What's tested: Check in POD spec if securityContext.privileged == true. Read more at ARMO-C0057

Insecure capabilities

Added to CNF Certification in v1.0
Expectation: Containers should not have insecure capabilities enabled

What's tested: Checks for insecure capabilities. See more at ARMO-C0046

This test checks against a blacklist of insecure capabilities.

Non-root containers

Added to CNF Certification in v1.0
Expectation: Containers should run with non-root user with non-root group membership

What's tested: Checks if containers are running with non-root user with non-root membership. Read more at ARMO-C0013

Host PID/IPC privileges

Added to CNF Certification in v1.0
Expectation: Containers should not have hostPID and hostIPC privileges

What's tested: Checks if containers are running with hostPID or hostIPC privileges. Read more at ARMO-C0038

[SELinux options]

Added to CNF Certification in v1.0
Expectation: SELinux options should not be used

What's tested: Checks if CNF resources use custom SELinux options that allow privilege escalation (selinux_options)

Linux hardening

Added to CNF Certification in v1.0
Expectation: Security services are being used to harden application

What's tested: Checks if security services are being used to harden the application. Read more at ARMO-C0055

CPU Limits

Added to CNF Certification in v2.0-beta
Expectation: Containers should have CPU limits defined

What's tested: Check for each container if there is a ‘limits.cpu’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0270.

Memory Limits

Added to CNF Certification in v2.0-beta
Expectation: Containers should have memory limits defined

What's tested: Check for each container if there is a ‘limits.memory’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0271.