Closed miklezzzz closed 2 weeks ago
an example of grouped (parallel) run:
Queue 'main': length 31, status: 'run first task'
1. GroupedModuleRun:main:Grouped run for cloud-data-crd, metallb-crd, operator-prometheus-crd, prometheus-crd, snapshot-controller-crd, user-authn-crd, vertical-pod-autoscaler-crd:OperatorStartup
2. ModuleRun:main:flow-schema:doStartup:OperatorStartup
3. ModuleRun:main:admission-policy-engine:doStartup:OperatorStartup
4. ModuleRun:main:cloud-provider-openstack:doStartup:OperatorStartup
5. ModuleRun:main:local-path-provisioner:doStartup:OperatorStartup
6. ModuleRun:main:cni-flannel:doStartup:OperatorStartup
7. ModuleRun:main:kube-proxy:doStartup:OperatorStartup
8. ModuleRun:main:registry-packages-proxy:doStartup:OperatorStartup
9. GroupedModuleRun:main:Grouped run for control-plane-manager, node-manager, terraform-manager:OperatorStartup
10. ModuleRun:main:kube-dns:doStartup:OperatorStartup
11. ModuleRun:main:snapshot-controller:doStartup:OperatorStartup
12. ModuleRun:main:cert-manager:doStartup:OperatorStartup
13. ModuleRun:main:user-authz:doStartup:OperatorStartup
14. ModuleRun:main:user-authn:doStartup:OperatorStartup
15. ModuleRun:main:operator-prometheus:doStartup:OperatorStartup
16. ModuleRun:main:prometheus:doStartup:OperatorStartup
17. ModuleRun:main:prometheus-metrics-adapter:doStartup:OperatorStartup
18. ModuleRun:main:vertical-pod-autoscaler:doStartup:OperatorStartup
19. GroupedModuleRun:main:Grouped run for extended-monitoring, monitoring-applications, monitoring-custom, monitoring-deckhouse, monitoring-kubernetes, monitoring-kubernetes-control-plane, monitoring-ping:OperatorStartup
20. ModuleRun:main:node-local-dns:doStartup:OperatorStartup
21. ModuleRun:main:ingress-nginx:doStartup:OperatorStartup
22. ModuleRun:main:log-shipper:doStartup:OperatorStartup
23. ModuleRun:main:pod-reloader:doStartup:OperatorStartup
24. ModuleRun:main:chrony:doStartup:OperatorStartup
25. GroupedModuleRun:main:Grouped run for dashboard, operator-trivy, upmeter:OperatorStartup
26. GroupedModuleRun:main:Grouped run for namespace-configurator, secret-copier:OperatorStartup
27. ModuleRun:main:deckhouse-tools:doStartup:OperatorStartup
28. ModuleRun:main:documentation:doStartup:OperatorStartup
29. GroupedModuleRun:main:Grouped run for echo, mcplay:OperatorStartup
30. ConvergeModules:main:::Operator-Startup
31. ModuleHookRun:main:kubernetes:002-deckhouse/hooks/change_host_ip.go:pod:Kubernetes
Queue 'group_queue_0': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_0:cloud-data-crd:doStartup:OperatorStartup
Queue 'group_queue_1': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_1:metallb-crd:doStartup:OperatorStartup
Queue 'group_queue_2': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_2:operator-prometheus-crd:doStartup:OperatorStartup
Queue 'group_queue_3': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_3:prometheus-crd:doStartup:OperatorStartup
Queue 'group_queue_4': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_4:snapshot-controller-crd:doStartup:OperatorStartup
Queue 'group_queue_5': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_5:user-authn-crd:doStartup:OperatorStartup
Queue 'group_queue_6': length 1, status: 'waiting for task 20s'
1. ModuleRun:group_queue_6:vertical-pod-autoscaler-crd:doStartup:OperatorStartup
Summary:
- 'main' queue: 31 tasks.
- 14 other queues (7 active, 7 empty): 7 tasks.
- total 38 tasks to handle.
a failed task in a grouped run
Queue 'main': length 8, status: 'run first task'
1. GroupedModuleRun:main:Grouped run for mcplay:OperatorStartup:failures 1:
Errors:
- mcplay: helm upgrade failed: cannot patch "mcplay" with kind Deployment: Deployment.apps "mcplay" is invalid: spec.template.spec.containers: Required value
2. ConvergeModules:main:::Operator-Startup
3. ModuleHookRun:main:kubernetes:002-deckhouse/hooks/change_host_ip.go:pod:Kubernetes
4. ModuleRun:main:node-manager:Kubernetes-Change-ModuleValues
5. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
6. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
7. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
8. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
Queue 'group_queue_1': length 1, status: 'run first task'
1. ModuleRun:group_queue_1:mcplay:doStartup:OperatorStartup:failures 1:helm upgrade failed: cannot patch "mcplay" with kind Deployment: Deployment.apps "mcplay" is invalid: spec.template.spec.containers: Required value
Summary:
- 'main' queue: 8 tasks.
- 99 other queues (1 active, 98 empty): 1 task.
- total 9 tasks to handle.
yet another example:
Queue 'main': length 12, status: 'run first task'
1. GroupedModuleRun:main:Grouped run for echo, mcplay:OperatorStartup:failures 11:
Errors:
- echo: helm upgrade failed: cannot patch "echo-server" with kind Deployment: Deployment.apps "echo-server" is invalid: spec.template.spec.containers: Required value
- mcplay: helm upgrade failed: cannot patch "mcplay" with kind Deployment: Deployment.apps "mcplay" is invalid: spec.template.spec.containers: Required value
2. ConvergeModules:main:::Operator-Startup
3. ModuleHookRun:main:kubernetes:002-deckhouse/hooks/change_host_ip.go:pod:Kubernetes
4. ModuleRun:main:node-manager:Kubernetes-Change-ModuleValues
5. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
6. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
7. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
8. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
9. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
10. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
11. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:statefulsets:Kubernetes
12. ModuleHookRun:main:kubernetes:340-extended-monitoring/hooks/alert_old_annotation.go:namespaces:Kubernetes
Queue 'group_queue_0': length 1, status: 'sleep after fail for 21.4s (1s left of 21s delay)'
1. ModuleRun:group_queue_0:echo:doStartup:OperatorStartup:failures 6:helm upgrade failed: cannot patch "echo-server" with kind Deployment: Deployment.apps "echo-server" is invalid: spec.template.spec.containers: Required value
Queue 'group_queue_1': length 1, status: 'sleep after fail for 13.3s (3s left of 13s delay)'
1. ModuleRun:group_queue_1:mcplay:doStartup:OperatorStartup:failures 5:helm upgrade failed: cannot patch "mcplay" with kind Deployment: Deployment.apps "mcplay" is invalid: spec.template.spec.containers: Required value
Summary:
- 'main' queue: 12 tasks.
- 99 other queues (2 active, 97 empty): 2 tasks.
- total 14 tasks to handle.
Group
makes it feel like a logic group of modules, e.g. "group of monitoring modules", "group of cni modules". Why don't name it according to PR description: ParallelModuleRun
?
Also, there is a group
parameter in kubernetes subscriptions.
makes sense
[deckhouse] deckhouse@dev-master-0 /deckhouse $ deckhouse-controller queue list
Queue 'main': length 33, status: 'run first task'
1. ParallelModuleRun:main:Parallel run for cloud-data-crd, metallb-crd, operator-prometheus-crd, prometheus-crd, snapshot-controller-crd, user-authn-crd, vertical-pod-autoscaler-crd:OperatorStartup
2. ModuleRun:main:flow-schema:doStartup:OperatorStartup
3. ModuleRun:main:admission-policy-engine:doStartup:OperatorStartup
4. ModuleRun:main:cloud-provider-openstack:doStartup:OperatorStartup
5. ModuleRun:main:local-path-provisioner:doStartup:OperatorStartup
6. ModuleRun:main:cni-flannel:doStartup:OperatorStartup
7. ModuleRun:main:kube-proxy:doStartup:OperatorStartup
8. ModuleRun:main:registry-packages-proxy:doStartup:OperatorStartup
9. ParallelModuleRun:main:Parallel run for control-plane-manager, node-manager, terraform-manager:OperatorStartup
10. ModuleRun:main:kube-dns:doStartup:OperatorStartup
11. ModuleRun:main:snapshot-controller:doStartup:OperatorStartup
12. ModuleRun:main:cert-manager:doStartup:OperatorStartup
13. ModuleRun:main:user-authz:doStartup:OperatorStartup
14. ModuleRun:main:user-authn:doStartup:OperatorStartup
15. ModuleRun:main:operator-prometheus:doStartup:OperatorStartup
16. ModuleRun:main:prometheus:doStartup:OperatorStartup
17. ModuleRun:main:prometheus-metrics-adapter:doStartup:OperatorStartup
18. ModuleRun:main:vertical-pod-autoscaler:doStartup:OperatorStartup
19. ParallelModuleRun:main:Parallel run for extended-monitoring, monitoring-applications, monitoring-custom, monitoring-deckhouse, monitoring-kubernetes, monitoring-kubernetes-control-plane, monitoring-ping:OperatorStartup
20. ModuleRun:main:node-local-dns:doStartup:OperatorStartup
21. ModuleRun:main:metallb:doStartup:OperatorStartup
22. ModuleRun:main:l2-load-balancer:doStartup:OperatorStartup
23. ModuleRun:main:ingress-nginx:doStartup:OperatorStartup
24. ModuleRun:main:log-shipper:doStartup:OperatorStartup
25. ModuleRun:main:pod-reloader:doStartup:OperatorStartup
26. ModuleRun:main:chrony:doStartup:OperatorStartup
27. ParallelModuleRun:main:Parallel run for dashboard, operator-trivy, upmeter:OperatorStartup
28. ParallelModuleRun:main:Parallel run for namespace-configurator, secret-copier:OperatorStartup
29. ModuleRun:main:deckhouse-tools:doStartup:OperatorStartup
30. ModuleRun:main:documentation:doStartup:OperatorStartup
31. ParallelModuleRun:main:Parallel run for echo, mcplay:OperatorStartup
32. ConvergeModules:main:::Operator-Startup
33. ModuleHookRun:main:kubernetes:002-deckhouse/hooks/change_host_ip.go:pod:Kubernetes
Queue 'parallel_queue_0': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_0:snapshot-controller-crd:doStartup:OperatorStartup
Queue 'parallel_queue_1': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_1:user-authn-crd:doStartup:OperatorStartup
Queue 'parallel_queue_2': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_2:vertical-pod-autoscaler-crd:doStartup:OperatorStartup
Queue 'parallel_queue_3': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_3:cloud-data-crd:doStartup:OperatorStartup
Queue 'parallel_queue_4': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_4:metallb-crd:doStartup:OperatorStartup
Queue 'parallel_queue_5': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_5:operator-prometheus-crd:doStartup:OperatorStartup
Queue 'parallel_queue_6': length 1, status: 'run first task'
1. ModuleRun:parallel_queue_6:prometheus-crd:doStartup:OperatorStartup
Summary:
- 'main' queue: 33 tasks.
- 14 other queues (7 active, 7 empty): 7 tasks.
- total 40 tasks to handle.
Overview
ModuleRun
tasks and correspondingModuleHookRun
tasks for modules of the same order (weight) are executedin parallel
inparallel
queues. There are 10parallel
queues by default in the operator's queue set.What this PR does / why we need it
This pr adds new type of tasks -
ParallelModuleRun
. A task of this type represents a group of smaller tasks with the same order/weight ofModuleRun
andModuleHookRun
types. These subordinate tasks are executed in parallel pre-created namedparallel_queue_x
queues and all the results and errors are propagated back to the correspondingParallelModuleRun
task that updates its status accordingly.Special notes for your reviewer