cnti-testcatalog / testsuite

📞📱☎️📡🌐 Cloud Native Telecom Initiative (CNTI) Test Catalog is a tool to check for and provide feedback on the use of K8s + cloud native best practices in networking applications and platforms
https://wiki.lfnetworking.org/display/LN/Test+Catalog
Apache License 2.0
174 stars 72 forks source link

Test Documentation Updates #2118

Open Smitholi67 opened 3 months ago

Smitholi67 commented 3 months ago

Review and update test documentation in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md

The documentation below has been removed from the Certification repo in the interest of reducing duplication of efforts. The Test Suite is where we document tests in terms of what is the test covering, what is the rationale, what are the expected results of the test, and any potential remediation steps.

As the documentation below has been removed from the Certification repo, it is recommended to review the text below and incorporate missing information (of value) into the existing test documentation located in https://github.com/cnti-testcatalog/testsuite/blob/main/docs/TEST_DOCUMENTATION.md

Documentation removed from CNTi Certification Repo as of July 2024:   List of Workload Tests

Compatibility, Installability, and Upgradability Category

Increase decrease capacity:

What's tested: The pod is increased and replicated to 3 for the CNF image or release being tested. After increase_capacity increases the replicas to 3, it decreases back to 1.

The increase and decrease capacity tests: HPA (horizonal pod autoscale) will autoscale replicas to accommodate when there is an increase of CPU, memory or other configured metrics to prevent disruption by allowing more requests by balancing out the utilisation across all of the pods.

Decreasing replicas works the same as increase but rather scale down the number of replicas when the traffic decreases to the number of pods that can handle the requests.

You can read more about horizonal pod autoscaling to create replicas here and in the K8s scaling cheatsheet.

Helm chart published

What's tested: Checks if a Helm chart is published

Helm chart valid

What's tested: This runs helm lint against the helm chart being tested. You can read more about the helm lint command at helm.sh

Helm deploy

What's tested: This checks if the CNF was deployed using Helm

Rollback

What's tested: To check if a CNF version can be rolled back

CNI compatible

What's tested: This installs temporary kind clusters and will test the CNF against both Calico and Cilium CNIs.

Microservice Category

Reasonable image size

What's tested: Checks the size of the image used.

Reasonable startup time

What's tested: This counts how many seconds it takes for the CNF to startup.

Single process type in one container

What's tested: This verifies that there is only one process type within one container. This does not count against child processes. Example would be nginx or httpd could have a parent process and then 10 child processes but if both nginx and httpd were running, this test would fail.

Service discovery

What's tested: This tests and checks if a container for the CNF has services exposed. Application access for microservices within a cluster should be exposed via a Service. Read more about K8s Service here.

Shared database

What's tested: This tests if multiple CNFs are using the same database.

SIGTERM Handled

What's tested: This tests if the PID 1 process of containers handles SIGTERM.

Specialized Init System

What's tested: This tests if containers in pods have dumb-init, tini or s6-overlay as init processes.

Zombie Handled

What's tested: This tests if the PID 1 process of containers handles/reaps zombie processes.

State Category

Node drain

What's tested: A node is drained and rescheduled to another node, passing with a liveness and readiness check. This will skip when the cluster only has a single node.

No local volume configuration

What's tested: This tests if local volumes are being used for the CNF.

Elastic volumes

What's tested: This checks for elastic persistent volumes in use by the CNF.

Reliability, Resilience and Availability Category

Pod network latency

What's tested: This experiment causes network degradation without the pod being marked unhealthy/unworthy of traffic by kube-proxy (unless you have a liveness probe of sorts that measures latency and restarts/crashes the container). The idea of this experiment is to simulate issues within your pod network OR microservice communication across services in different availability zones/regions etc.

The applications may stall or get corrupted while they wait endlessly for a packet. The experiment limits the impact (blast radius) to only the traffic you want to test by specifying IP addresses or application information. This experiment will help to improve the resilience of your services over time.

Disk fill

What's tested: Stressing the disk with continuous and heavy IO for example can cause degradation in reads written by other microservices that use this shared disk for example modern storage solutions for Kubernetes to use the concept of storage pools out of which virtual volumes/devices are carved out. Another issue is the amount of scratch space eaten up on a node which leads to the lack of space for newer containers to get scheduled (Kubernetes too gives up by applying an "eviction" taint like "disk-pressure") and causes a wholesale movement of all pods to other nodes. Similarly with CPU chaos, by injecting a rogue process into a target container, we starve the main microservice process (typically PID 1) of the resources allocated to it (where limits are defined) causing slowness in application traffic or in other cases unrestrained use can cause the node to exhaust resources leading to the eviction of all pods. So this category of chaos experiment helps to build the immunity on the application undergoing any such stress scenario.

Pod delete

What's tested: This experiment helps to simulate such a scenario with forced/graceful pod failure on specific or random replicas of an application resource and checks the deployment sanity (replica availability & uninterrupted service) and recovery workflow of the application.

Memory hog

What's tested: The pod-memory hog experiment launches a stress process within the target container - which can cause either the primary process in the container to be resource constrained in cases where the limits are enforced OR eat up available system memory on the node in cases where the limits are not specified.

IO Stress

What's tested: This test stresses the disk with with continuous and heavy IO to cause degradation in reads/ writes by other microservices that use this shared disk.

Network corruption

What's tested: This test uses the LitmusChaos pod_network_corruption experiment.

Network duplication

What's tested: This test uses the LitmusChaos pod_network_duplication experiment.

Helm chart liveness

What's tested: This test checks for livenessProbe in the resource and container

Helm chart readiness

What's tested: This test check for readinessProbe in the resource and container

Observability and Diagnostic Category

Use stdout/stderr for logs

What's tested: This checks and verifies that STDOUT/STDERR is configured for logging.

For example, running kubectl get logs returns useful information for diagnosing or troubleshooting issues.

Prometheus installed

What's tested: Tests for the presence of Prometheus or if the CNF emit prometheus traffic.

Fluentd logs

What's tested: Checks for fluentd presence and if logs are being captured for fluentd.

OpenMetrics compatible

What's tested: Checks if OpenMetrics is being used and or compatible.

Jaeger tracing

What's tested: Checks if Jaeger is configured and tracing is being used.

Security Category

Container socket mounts

What's tested This test uses the Kyverno policy called Disallow CRI socket mounts

[Sysctls test]

What's tested: TBD

External IPs

What's tested: Checks if the CNF has services with external IPs configured

Privilege escalation

What's tested: TBD Privilege Escalation: Check that the allowPrivilegeEscalation field in securityContext of container is set to false.

See more at ARMO-C0016

Symlink file system

What's tested: This control checks the vulnerable versions and the actual usage of the subPath feature in all Pods in the cluster.

See more at ARMO-C0058

Application credentials

What's tested: Check if the pod has sensitive information in environment variables, by using list of known sensitive key names. Check if there are configmaps with sensitive information.

See more at ARMO-C0012

Host network

What's tested: Checks if there is a host network attached to a pod. See more at ARMO-C0041

Service account mapping

What's tested: Check if service accounts are automatically mapped. See more at ARMO-C0034.

Ingress and Egress blocked

What's tested: Checks Ingress and Egress traffic policy

Privileged containers, Kubescape

What's tested: Check in POD spec if securityContext.privileged == true. Read more at ARMO-C0057

Insecure capabilities

What's tested: Checks for insecure capabilities. See more at ARMO-C0046

This test checks against a blacklist of insecure capabilities.

Non-root containers

What's tested: Checks if containers are running with non-root user with non-root membership. Read more at ARMO-C0013

Host PID/IPC privileges

What's tested: Checks if containers are running with hostPID or hostIPC privileges. Read more at ARMO-C0038

[SELinux options]

What's tested: Checks if CNF resources use custom SELinux options that allow privilege escalation (selinux_options)

Linux hardening

What's tested: Checks if security services are being used to harden the application. Read more at ARMO-C0055

CPU Limits

What's tested: Check for each container if there is a ‘limits.cpu’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0270.

Memory Limits

What's tested: Check for each container if there is a ‘limits.memory’ field defined. Check for each limitrange/resourcequota if there is a max/hard field defined, respectively. Read more at ARMO-C0271.

Immutable File Systems

What's tested: Checks whether the readOnlyRootFilesystem field in the SecurityContext is set to true. Read more at ARMO-C0017

HostPath Mounts

What's tested: TBD Read more at ARMO-C0045

[Default namespaces]

What's tested: TBD

Configuration Category

[Latest tag]

What's tested: TBD

Require labels

What's tested: TBD

nodePort not used

What's tested: TBD

hostPort not used

What's tested: TBD

Hardcoded IP addresses in K8s runtime configuration

What's tested: TBD

Secrets used

What's tested: TBD

Immutable configmap

What's tested: TBD