Closed ryorke1 closed 6 months ago
To change the resource la request and limits, the only option is to tweak the subscription https://github.com/dapr-sandbox/dapr-kubernetes-operator/issues/77#issuecomment-1856067695
unfortunately the memory cannot be made configurable but I will digg into the memory consumption.
Do you have a way to reproduce it ? I never experienced such behavior
All we did was execute the steps above and that reproduced it. I don't think the dapr-control-plane would be affected by any existing pods that had dapr annotations for sidecar injection but maybe you can correct me if I am wrong. We did have a number of pods running that had the annotations during the initialization of the DaprInstance.
Do you have an example of how we could use the subscription to tweak the requests and limits in the context of the dapr-control-plane? Or am I mistaken about what you mean?
All we did was execute the steps above and that reproduced it. I don't think the dapr-control-plane would be affected by any existing pods that had dapr annotations for sidecar injection but maybe you can correct me if I am wrong. We did have a number of pods running that had the annotations during the initialization of the DaprInstance.
it should not as the one that is affected is the dapr-operator and other resources. The dapr control plane only generates the manifest. Maybe the watcher watches too many objects. I'll have a look.
Do you have an example of how we could use the subscription to tweak the requests and limits in the context of the dapr-control-plane? Or am I mistaken about what you mean?
No, I don't but there are a number of examples in the documentation mentioned in the linked comment.
I've tried to reproduce the issue but I've failed. What I did is:
But the operator works as expected and does not get OOMKilled:
➜ k get pods -l control-plane=dapr-control-plane -w
NAME READY STATUS RESTARTS AGE
dapr-control-plane-7796c9ff85-htk4g 1/1 Running 0 2m49s
➜ k top pod dapr-control-plane-7796c9ff85-htk4g
NAME CPU(cores) MEMORY(bytes)
dapr-control-plane-7796c9ff85-htk4g 7m 68Mi
I don't have any dapr application running so it is not 100% the same test, but for what concern the dapr-kubernetes-operator, it should not matter.
OK we are going to look into the OLM and see if we can adjust the resources of the dapr-control-plane. While we are doing that, I am curious to know if the dapr-control-plane being killed will cause any issues? IN our case, so far we do see the components in places and the CRDS were deployed (permission issues still exists #136 ) and we are using the dapr components so far without issues. What's your thoughts on this?
Also, was finally able to capture a screenshot of this crash (it goes OOMKilled and then immediately into CrashBackoffLoop so hard to capture as well).
Some logs from OpenShift as well
OK we are going to look into the OLM and see if we can adjust the resources of the dapr-control-plane. While we are doing that, I am curious to know if the dapr-control-plane being killed will cause any issues? IN our case, so far we do see the components in places and the CRDS were deployed (permission issues still exists #136 ) and we are using the dapr components so far without issues. What's your thoughts on this?
It should jot cause any issue as the role of the operator ia just to setup dapr and be sure the setup is in sync with the DaprInstance spec
Some logs from OpenShift as well
are you able to provide a reproducer ? like by deploying a DaprInstance similar to your one does not trigger OOMKiller on my environment so I need something similar to your setup to digg onto it further
Hi @lburgazzoli. Using subscriptions in OLM we were able to stabilize the dapr-control-plane pod. Here is the subscription we used for future reference if others run into this issue.
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
labels:
operators.coreos.com/dapr-kubernetes-operator.openshift-operators: ""
name: dapr-kubernetes-operator
namespace: openshift-operators
spec:
channel: alpha
config:
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
installPlanApproval: Manual
name: dapr-kubernetes-operator
source: community-operators
sourceNamespace: openshift-marketplace
startingCSV: dapr-kubernetes-operator.v0.0.8
As a side note, this did not resolve the propagation to the roles. We still need a admin to manually create roles for us to use these CRDs.
@ryorke1 I would really love to be able to reproduce it so I can fix the real problem (which maybe it is just about increasing the memory) so if at any point you have a sort of reproducer, please let me knoe
Expected Behavior
dapr-control-plane pod should remain stable and have configurable resource limits and requests.
Current Behavior
The dapr-control-plane pod is continuously being OOMKIlled as long as there is a DaprInstance created. If we remove the DaprInstance, the pod stablizes. The dapr-control-plane pod does seem to survive long enough to deploy the DaprInstance pods and CRDs but it takes a few OOMKills to complete. The pod still continues to crash but doesn't seem to affect the Dapr components.
Possible Solution
Steps to Reproduce
Environment
OpenShift: RedHad OpenShift Container Platform 4.12 Dapr Operator: 0.0.8 with 1.13.2 Dapr components