Open msftcoderdjw opened 6 months ago
Several thoughts: 1) we may want to consider using an Actor framework (such as Akka or Orleans) to allow individual instance/target to drive towards its own desired state in parallel; 2) for parallelism of job manager, we should use platform to scale it out - such as putting job vendor in a separate pod and scale the pod to multiple instances, which essentially implement the competing consumer pattern. 3) Also note that the solution manager is also used on agent side for reconciliation.
Several thoughts: 1) we may want to consider using an Actor framework (such as Akka or Orleans) to allow individual instance/target to drive towards its own desired state in parallel; 2) for parallelism of job manager, we should use platform to scale it out - such as putting job vendor in a separate pod and scale the pod to multiple instances, which essentially implement the competing consumer pattern. 3) Also note that the solution manager is also used on agent side for reconciliation.
I agree. I think we can divide the fixes into several phases.
Assign to @FireDefend for short term fix.
https://github.com/eclipse-symphony/symphony/blob/17392d8c1b1decc110ca3cef28a8675e4dd45c9f/api/pkg/apis/v1alpha1/managers/solution/solution-manager.go#L220-L222
Now global lock is used in reconcile function.