I want to implement cloud-native change management based on K8S. This involves monitoring deployment updates, verifying new pod versions, and combining cloud-native monitoring components, such as Prometheus, to obtain pod metrics and time-series data. Then, using our own intelligent anomaly detection algorithm, we can detect any anomalies in pod metrics and rollback changes in a timely manner. This process is similar to the deployment of Argo-rollouts, but instead of directly operating on the ReplicaSet extension capability, we want to develop a Workload change control framework to extend change management in the cloud-native domain. We have defined two core CRDs, “ChangeWorkload” and “ChangePod”. We have developed our own Operator to sense changes in the cloud-native environment, and a SpringBoot control-side application that returns all verification logic to the Operator application.
Regarding the process, we plan to start with deployment and divide it into the following stages:
Change awareness: When the podTemplate of a deployment changes, it is regarded as a new version online and will be processed.
Pre-change verification: Perform some admission verifications based on a webhook mechanism, providing rule configuration verification capabilities such as time window restrictions, such as not allowing changes at midnight.
Change execution blocking: kubectl rollout pause deployment When the control-side application detects a pod anomaly, the change is blocked directly through the API server.
Post-change verification: Support customized verification after the change is completed.
Change self-healing: Directly call the deployment rollback operation via the API server when a version exception is detected.
Do you have any better ideas for the technical solutions for change management in a cloud-native environment? We welcome any suggestions or feedback.
I want to implement cloud-native change management based on K8S. This involves monitoring deployment updates, verifying new pod versions, and combining cloud-native monitoring components, such as Prometheus, to obtain pod metrics and time-series data. Then, using our own intelligent anomaly detection algorithm, we can detect any anomalies in pod metrics and rollback changes in a timely manner. This process is similar to the deployment of Argo-rollouts, but instead of directly operating on the ReplicaSet extension capability, we want to develop a Workload change control framework to extend change management in the cloud-native domain. We have defined two core CRDs, “ChangeWorkload” and “ChangePod”. We have developed our own Operator to sense changes in the cloud-native environment, and a SpringBoot control-side application that returns all verification logic to the Operator application.
Regarding the process, we plan to start with deployment and divide it into the following stages:
Change awareness: When the podTemplate of a deployment changes, it is regarded as a new version online and will be processed. Pre-change verification: Perform some admission verifications based on a webhook mechanism, providing rule configuration verification capabilities such as time window restrictions, such as not allowing changes at midnight. Change execution blocking: kubectl rollout pause deployment When the control-side application detects a pod anomaly, the change is blocked directly through the API server. Post-change verification: Support customized verification after the change is completed. Change self-healing: Directly call the deployment rollback operation via the API server when a version exception is detected. Do you have any better ideas for the technical solutions for change management in a cloud-native environment? We welcome any suggestions or feedback.