📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
The documentation below has been removed from the Certification repo in the interest of reducing duplication of efforts. The Test Suite is where we document tests in terms of what is the test covering, what is the rationale, what are the expected results of the test, and any potential remediation steps.
Expectation: The number of replicas for a Pod increases, then the number of replicas for a Pod decreases
What's tested: The pod is increased and replicated to 3 for the CNF image or release being tested. After increase_capacity increases the replicas to 3, it decreases back to 1.
The increase and decrease capacity tests: HPA (horizonal pod autoscale) will autoscale replicas to accommodate when there is an increase of CPU, memory or other configured metrics to prevent disruption by allowing more requests
by balancing out the utilisation across all of the pods.
Decreasing replicas works the same as increase but rather scale down the number of replicas when the traffic decreases to the number of pods that can handle the requests.
You can read more about horizonal pod autoscaling to create replicas here and in the K8s scaling cheatsheet.
What's tested: This verifies that there is only one process type within one container. This does not count against child processes. Example would be nginx or httpd could have a parent process and then 10 child processes but if both nginx and httpd were running, this test would fail.
Expectation: CNFs should not expose their containers as a service
What's tested: This tests and checks if a container for the CNF has services exposed. Application access for microservices within a cluster should be exposed via a Service. Read more about K8s Service here.
Expectation: A node will be drained and rescheduled onto other available node(s).
What's tested: A node is drained and rescheduled to another node, passing with a liveness and readiness check. This will skip when the cluster only has a single node.
Expectation: The CNF should continue to function when network latency occurs
What's tested:This experiment causes network degradation without the pod being marked unhealthy/unworthy of traffic by kube-proxy (unless you have a liveness probe of sorts that measures latency and restarts/crashes the container). The idea of this experiment is to simulate issues within your pod network OR microservice communication across services in different availability zones/regions etc.
The applications may stall or get corrupted while they wait endlessly for a packet. The experiment limits the impact (blast radius) to only the traffic you want to test by specifying IP addresses or application information. This experiment will help to improve the resilience of your services over time.
Expectation: The CNF should continue to function when disk fill occurs
What's tested:Stressing the disk with continuous and heavy IO for example can cause degradation in reads written by other microservices that use this shared disk for example modern storage solutions for Kubernetes to use the concept of storage pools out of which virtual volumes/devices are carved out. Another issue is the amount of scratch space eaten up on a node which leads to the lack of space for newer containers to get scheduled (Kubernetes too gives up by applying an "eviction" taint like "disk-pressure") and causes a wholesale movement of all pods to other nodes. Similarly with CPU chaos, by injecting a rogue process into a target container, we starve the main microservice process (typically PID 1) of the resources allocated to it (where limits are defined) causing slowness in application traffic or in other cases unrestrained use can cause the node to exhaust resources leading to the eviction of all pods. So this category of chaos experiment helps to build the immunity on the application undergoing any such stress scenario.
Expectation: The CNF should continue to function when pod delete occurs
What's tested:This experiment helps to simulate such a scenario with forced/graceful pod failure on specific or random replicas of an application resource and checks the deployment sanity (replica availability & uninterrupted service) and recovery workflow of the application.
Expectation: The CNF should continue to function when pod memory hog occurs
What's tested: The pod-memory hog experiment launches a stress process within the target container - which can cause either the primary process in the container to be resource constrained in cases where the limits are enforced OR eat up available system memory on the node in cases where the limits are not specified.
Expectation: The CNF should continue to function when pod io stress occurs
What's tested: This test stresses the disk with with continuous and heavy IO to cause degradation in reads/ writes by other microservices that use this shared disk.
Exepectation: Application credentials should not be found in configuration files
What's tested:
Check if the pod has sensitive information in environment variables, by using list of known sensitive key names. Check if there are configmaps with sensitive information.
Expectation: Containers should have CPU limits defined
What's tested:
Check for each container if there is a ‘limits.cpu’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0270.
Expectation: Containers should have memory limits defined
What's tested:
Check for each container if there is a ‘limits.memory’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0271.
Review and update test documentation in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md
The documentation below has been removed from the Certification repo in the interest of reducing duplication of efforts. The Test Suite is where we document tests in terms of what is the test covering, what is the rationale, what are the expected results of the test, and any potential remediation steps.
As the documentation below has been removed from the Certification repo, it is recommended to review the text below and incorporate missing information (of value) into the existing test documentation located in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md
Documentation removed from CNTi Certification Repo as of July 2024: List of Workload Tests
Compatibility, Installability, and Upgradability Category
Increase decrease capacity:
What's tested: The pod is increased and replicated to 3 for the CNF image or release being tested. After
increase_capacity
increases the replicas to 3, it decreases back to 1.The increase and decrease capacity tests: HPA (horizonal pod autoscale) will autoscale replicas to accommodate when there is an increase of CPU, memory or other configured metrics to prevent disruption by allowing more requests by balancing out the utilisation across all of the pods.
Decreasing replicas works the same as increase but rather scale down the number of replicas when the traffic decreases to the number of pods that can handle the requests.
You can read more about horizonal pod autoscaling to create replicas here and in the K8s scaling cheatsheet.
Helm chart published
What's tested: Checks if a Helm chart is published
Helm chart valid
What's tested: This runs
helm lint
against the helm chart being tested. You can read more about the helm lint command at helm.shHelm deploy
What's tested: This checks if the CNF was deployed using Helm
Rollback
What's tested: To check if a CNF version can be rolled back
CNI compatible
What's tested: This installs temporary kind clusters and will test the CNF against both Calico and Cilium CNIs.
Microservice Category
Reasonable image size
What's tested: Checks the size of the image used.
Reasonable startup time
What's tested: This counts how many seconds it takes for the CNF to startup.
Single process type in one container
What's tested: This verifies that there is only one process type within one container. This does not count against child processes. Example would be nginx or httpd could have a parent process and then 10 child processes but if both nginx and httpd were running, this test would fail.
Service discovery
What's tested: This tests and checks if a container for the CNF has services exposed. Application access for microservices within a cluster should be exposed via a Service. Read more about K8s Service here.
Shared database
What's tested: This tests if multiple CNFs are using the same database.
SIGTERM Handled
What's tested: This tests if the PID 1 process of containers handles SIGTERM.
Specialized Init System
What's tested: This tests if containers in pods have dumb-init, tini or s6-overlay as init processes.
Zombie Handled
What's tested: This tests if the PID 1 process of containers handles/reaps zombie processes.
State Category
Node drain
What's tested: A node is drained and rescheduled to another node, passing with a liveness and readiness check. This will skip when the cluster only has a single node.
No local volume configuration
What's tested: This tests if local volumes are being used for the CNF.
Elastic volumes
What's tested: This checks for elastic persistent volumes in use by the CNF.
Reliability, Resilience and Availability Category
Pod network latency
What's tested: This experiment causes network degradation without the pod being marked unhealthy/unworthy of traffic by kube-proxy (unless you have a liveness probe of sorts that measures latency and restarts/crashes the container). The idea of this experiment is to simulate issues within your pod network OR microservice communication across services in different availability zones/regions etc.
The applications may stall or get corrupted while they wait endlessly for a packet. The experiment limits the impact (blast radius) to only the traffic you want to test by specifying IP addresses or application information. This experiment will help to improve the resilience of your services over time.
Disk fill
What's tested: Stressing the disk with continuous and heavy IO for example can cause degradation in reads written by other microservices that use this shared disk for example modern storage solutions for Kubernetes to use the concept of storage pools out of which virtual volumes/devices are carved out. Another issue is the amount of scratch space eaten up on a node which leads to the lack of space for newer containers to get scheduled (Kubernetes too gives up by applying an "eviction" taint like "disk-pressure") and causes a wholesale movement of all pods to other nodes. Similarly with CPU chaos, by injecting a rogue process into a target container, we starve the main microservice process (typically PID 1) of the resources allocated to it (where limits are defined) causing slowness in application traffic or in other cases unrestrained use can cause the node to exhaust resources leading to the eviction of all pods. So this category of chaos experiment helps to build the immunity on the application undergoing any such stress scenario.
Pod delete
What's tested: This experiment helps to simulate such a scenario with forced/graceful pod failure on specific or random replicas of an application resource and checks the deployment sanity (replica availability & uninterrupted service) and recovery workflow of the application.
Memory hog
What's tested: The pod-memory hog experiment launches a stress process within the target container - which can cause either the primary process in the container to be resource constrained in cases where the limits are enforced OR eat up available system memory on the node in cases where the limits are not specified.
IO Stress
What's tested: This test stresses the disk with with continuous and heavy IO to cause degradation in reads/ writes by other microservices that use this shared disk.
Network corruption
What's tested: This test uses the LitmusChaos pod_network_corruption experiment.
Network duplication
What's tested: This test uses the LitmusChaos pod_network_duplication experiment.
Helm chart liveness
What's tested: This test checks for livenessProbe in the resource and container
Helm chart readiness
What's tested: This test check for readinessProbe in the resource and container
Observability and Diagnostic Category
Use stdout/stderr for logs
What's tested: This checks and verifies that STDOUT/STDERR is configured for logging.
For example, running
kubectl get logs
returns useful information for diagnosing or troubleshooting issues.Prometheus installed
What's tested: Tests for the presence of Prometheus or if the CNF emit prometheus traffic.
Fluentd logs
What's tested: Checks for fluentd presence and if logs are being captured for fluentd.
OpenMetrics compatible
What's tested: Checks if OpenMetrics is being used and or compatible.
Jaeger tracing
What's tested: Checks if Jaeger is configured and tracing is being used.
Security Category
Container socket mounts
What's tested This test uses the Kyverno policy called Disallow CRI socket mounts
[Sysctls test]
What's tested: TBD
External IPs
What's tested: Checks if the CNF has services with external IPs configured
Privilege escalation
What's tested: TBD Privilege Escalation: Check that the allowPrivilegeEscalation field in securityContext of container is set to false.
See more at ARMO-C0016
Symlink file system
What's tested: This control checks the vulnerable versions and the actual usage of the subPath feature in all Pods in the cluster.
See more at ARMO-C0058
Application credentials
What's tested: Check if the pod has sensitive information in environment variables, by using list of known sensitive key names. Check if there are configmaps with sensitive information.
See more at ARMO-C0012
Host network
What's tested: Checks if there is a host network attached to a pod. See more at ARMO-C0041
Service account mapping
What's tested: Check if service accounts are automatically mapped. See more at ARMO-C0034.
Ingress and Egress blocked
What's tested: Checks Ingress and Egress traffic policy
Privileged containers, Kubescape
What's tested: Check in POD spec if securityContext.privileged == true. Read more at ARMO-C0057
Insecure capabilities
What's tested: Checks for insecure capabilities. See more at ARMO-C0046
This test checks against a blacklist of insecure capabilities.
Non-root containers
What's tested: Checks if containers are running with non-root user with non-root membership. Read more at ARMO-C0013
Host PID/IPC privileges
What's tested: Checks if containers are running with hostPID or hostIPC privileges. Read more at ARMO-C0038
[SELinux options]
What's tested: Checks if CNF resources use custom SELinux options that allow privilege escalation (selinux_options)
Linux hardening
What's tested: Checks if security services are being used to harden the application. Read more at ARMO-C0055
CPU Limits
What's tested: Check for each container if there is a ‘limits.cpu’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0270.
Memory Limits
What's tested: Check for each container if there is a ‘limits.memory’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0271.
Immutable File Systems
What's tested: Checks whether the readOnlyRootFilesystem field in the SecurityContext is set to true. Read more at ARMO-C0017
HostPath Mounts
What's tested: TBD Read more at ARMO-C0045
[Default namespaces]
What's tested: TBD
Configuration Category
[Latest tag]
What's tested: TBD
Require labels
What's tested: TBD
nodePort not used
What's tested: TBD
hostPort not used
What's tested: TBD
Hardcoded IP addresses in K8s runtime configuration
What's tested: TBD
Secrets used
What's tested: TBD
Immutable configmap
What's tested: TBD