canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
245 stars 119 forks source link

Setting model.app.status causes the deployment to hang #409

Closed camille-rodriguez closed 2 years ago

camille-rodriguez commented 4 years ago

In a test, I forced a failure of an api call inside my charm, to set the unit to a blocked status. I realised this doesn't set the app status to blocked, so I changed the model.unit.status to model.app.status, and it caused the charm to hang.

When using app.status:

response = utils.create_pod_security_policy_with_api(
            namespace=self._stored.namespace,
        )
        if response != 0:
            print('log log')
            self.model.app.status = \
                BlockedStatus("An error occured during init. Please check the logs.")
            print('log here too')
            return response

application-metallb-controller: 12:23:32 ERROR unit.metallb-controller/18.juju-log Exception when calling PolicyV1beta1Api->create_pod_security_policy: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 18 Sep 2020 17:23:32 GMT', 'Content-Length': '426'})
HTTP response body: {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "podsecuritypolicies.policy is forbidden: User \"system:serviceaccount:metallb:metallb-controller-operator\" cannot create resource \"podsecuritypolicies\" in API group \"policy\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "group": "policy",
    "kind": "podsecuritypolicies"
  },
  "code": 403
}

application-metallb-controller: 12:23:33 DEBUG unit.metallb-controller/18.start log log
application-metallb-controller: 12:23:33 DEBUG unit.metallb-controller/18.start log here too
application-metallb-controller: 12:23:34 DEBUG unit.metallb-controller/18.juju-log Operator Framework 0.9.0 up and running.
application-metallb-controller: 12:23:34 DEBUG unit.metallb-controller/18.juju-log Legacy hooks/config-changed does not exist.
application-metallb-controller: 12:23:35 DEBUG unit.metallb-controller/18.juju-log Using controller storage: JUJU_VERSION=2.8.3
application-metallb-controller: 12:23:35 DEBUG unit.metallb-controller/18.juju-log Emitting Juju event config_changed.
application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Operator Framework 0.9.0 up and running.
application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Legacy hooks/update-status does not exist.
application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Using controller storage: JUJU_VERSION=2.8.3
application-metallb-speaker: 12:23:39 DEBUG unit.metallb-speaker/0.juju-log Emitting Juju event update_status.

Model    Controller  Cloud/Region        Version  SLA          Timestamp
metallb  microk8s    microk8s/localhost  2.8.3    unsupported  12:25:48-05:00

App                 Version            Status  Scale  Charm               Store  Rev  OS          Address         Notes
metallb-controller  controller:v0.9.3  active      1  metallb-controller  local   19  kubernetes  10.152.183.35   
metallb-speaker     speaker:v0.9.3     active      1  metallb-speaker     local    0  kubernetes  10.152.183.211  

Unit                    Workload     Agent  Address      Ports     Message
metallb-controller/18*  maintenance  idle   10.1.142.41  7472/TCP  Configuring pod
metallb-speaker/0*      active       idle   10.1.142.10  7472/TCP  Ready

$ juju show-status-log metallb-controller/18
Time                        Type       Status       Message
18 Sep 2020 12:23:00-05:00  juju-unit  allocating   
18 Sep 2020 12:23:00-05:00  workload   waiting      agent initializing
18 Sep 2020 12:23:19-05:00  workload   maintenance  installing charm software
18 Sep 2020 12:23:19-05:00  juju-unit  executing    running install hook
18 Sep 2020 12:23:24-05:00  juju-unit  executing    running leader-elected hook
18 Sep 2020 12:23:27-05:00  juju-unit  executing    running config-changed hook
18 Sep 2020 12:23:29-05:00  juju-unit  executing    running start hook
18 Sep 2020 12:23:33-05:00  juju-unit  idle         
18 Sep 2020 12:23:34-05:00  juju-unit  executing    running config-changed hook
18 Sep 2020 12:23:35-05:00  juju-unit  allocating   Container image "jujusolutions/jujud-operator:2.8.3.152" already present on machine
18 Sep 2020 12:23:35-05:00  juju-unit  allocating   Created container juju-pod-init
18 Sep 2020 12:23:36-05:00  juju-unit  allocating   Started container juju-pod-init
18 Sep 2020 12:23:42-05:00  juju-unit  idle         
18 Sep 2020 12:23:42-05:00  workload   maintenance  Configuring pod

When using the same code for everything, but only switch the app.status to unit.status, it works as expected :

if response != 0:
            print('log log')
            self.model.unit.status = \
                BlockedStatus("An error occured during init. Please check the logs.")
            print('log here too')
            return response

application-metallb-controller: 12:29:44 ERROR unit.metallb-controller/19.juju-log Exception when calling PolicyV1beta1Api->create_pod_security_policy: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 18 Sep 2020 17:29:44 GMT', 'Content-Length': '426'})
HTTP response body: {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "podsecuritypolicies.policy is forbidden: User \"system:serviceaccount:metallb:metallb-controller-operator\" cannot create resource \"podsecuritypolicies\" in API group \"policy\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "group": "policy",
    "kind": "podsecuritypolicies"
  },
  "code": 403
}

application-metallb-controller: 12:29:45 DEBUG unit.metallb-controller/19.start log log
application-metallb-controller: 12:29:45 DEBUG unit.metallb-controller/19.start log here too
application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Operator Framework 0.9.0 up and running.
application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Legacy hooks/config-changed does not exist.
application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Using controller storage: JUJU_VERSION=2.8.3
application-metallb-controller: 12:29:47 DEBUG unit.metallb-controller/19.juju-log Emitting Juju event config_changed.

crodriguez@camille-ubuntu:~/Projects/metallb-bundle$ juju status
Model    Controller  Cloud/Region        Version  SLA          Timestamp
metallb  microk8s    microk8s/localhost  2.8.3    unsupported  12:30:30-05:00

App                 Version            Status  Scale  Charm               Store  Rev  OS          Address         Notes
metallb-controller  controller:v0.9.3  active      1  metallb-controller  local   20  kubernetes  10.152.183.111  
metallb-speaker     speaker:v0.9.3     active      1  metallb-speaker     local    0  kubernetes  10.152.183.211  

Unit                    Workload  Agent  Address      Ports     Message
metallb-controller/19*  blocked   idle   10.1.142.43  7472/TCP  An error occured during init. Please check the logs.
metallb-speaker/0*      active    idle   10.1.142.10  7472/TCP  Ready

crodriguez@camille-ubuntu:~/Projects/metallb-bundle$ juju show-status-log metallb-controller/19
Time                        Type       Status       Message
18 Sep 2020 12:29:19-05:00  juju-unit  allocating   
18 Sep 2020 12:29:19-05:00  workload   waiting      agent initializing
18 Sep 2020 12:29:37-05:00  workload   maintenance  installing charm software
18 Sep 2020 12:29:37-05:00  juju-unit  executing    running install hook
18 Sep 2020 12:29:39-05:00  juju-unit  executing    running leader-elected hook
18 Sep 2020 12:29:41-05:00  juju-unit  executing    running config-changed hook
18 Sep 2020 12:29:42-05:00  juju-unit  executing    running start hook
18 Sep 2020 12:29:44-05:00  workload   maintenance  Configuring pod
18 Sep 2020 12:29:45-05:00  juju-unit  idle         
18 Sep 2020 12:29:46-05:00  juju-unit  executing    running config-changed hook
18 Sep 2020 12:29:47-05:00  juju-unit  allocating   Created container juju-pod-init
18 Sep 2020 12:29:47-05:00  juju-unit  allocating   Started container juju-pod-init
18 Sep 2020 12:29:53-05:00  juju-unit  idle         
18 Sep 2020 12:29:53-05:00  workload   blocked      An error occured during init. Please check the logs.
jameinel commented 4 years ago

app status can only be set by leader. So you should do an is_leader check first. I don't know if that is relevant to why it would fail/hang or if you were already checking that elsewhere.

John =:->

On Fri, Sep 18, 2020, 12:34 Camille Rodriguez notifications@github.com wrote:

In a test, I forced a failure of an api call inside my charm, to set the unit to a blocked status. I realised this doesn't set the app status to blocked, so I changed the model.unit.status to model.app.status, and it caused the charm to hang.

When using app.status:

response = utils.create_pod_security_policy_with_api( namespace=self._stored.namespace, ) if response != 0: print('log log') self.model.app.status = \ BlockedStatus("An error occured during init. Please check the logs.") print('log here too') return response

application-metallb-controller: 12:23:32 ERROR unit.metallb-controller/18.juju-log Exception when calling PolicyV1beta1Api->create_pod_security_policy: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 18 Sep 2020 17:23:32 GMT', 'Content-Length': '426'}) HTTP response body: { "kind": "Status", "apiVersion": "v1", "metadata": {

}, "status": "Failure", "message": "podsecuritypolicies.policy is forbidden: User \"system:serviceaccount:metallb:metallb-controller-operator\" cannot create resource \"podsecuritypolicies\" in API group \"policy\" at the cluster scope", "reason": "Forbidden", "details": { "group": "policy", "kind": "podsecuritypolicies" }, "code": 403 }

application-metallb-controller: 12:23:33 DEBUG unit.metallb-controller/18.start log log application-metallb-controller: 12:23:33 DEBUG unit.metallb-controller/18.start log here too application-metallb-controller: 12:23:34 DEBUG unit.metallb-controller/18.juju-log Operator Framework 0.9.0 up and running. application-metallb-controller: 12:23:34 DEBUG unit.metallb-controller/18.juju-log Legacy hooks/config-changed does not exist. application-metallb-controller: 12:23:35 DEBUG unit.metallb-controller/18.juju-log Using controller storage: JUJU_VERSION=2.8.3 application-metallb-controller: 12:23:35 DEBUG unit.metallb-controller/18.juju-log Emitting Juju event config_changed. application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Operator Framework 0.9.0 up and running. application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Legacy hooks/update-status does not exist. application-metallb-speaker: 12:23:38 DEBUG unit.metallb-speaker/0.juju-log Using controller storage: JUJU_VERSION=2.8.3 application-metallb-speaker: 12:23:39 DEBUG unit.metallb-speaker/0.juju-log Emitting Juju event update_status.

Model Controller Cloud/Region Version SLA Timestamp metallb microk8s microk8s/localhost 2.8.3 unsupported 12:25:48-05:00

App Version Status Scale Charm Store Rev OS Address Notes metallb-controller controller:v0.9.3 active 1 metallb-controller local 19 kubernetes 10.152.183.35 metallb-speaker speaker:v0.9.3 active 1 metallb-speaker local 0 kubernetes 10.152.183.211

Unit Workload Agent Address Ports Message metallb-controller/18 maintenance idle 10.1.142.41 7472/TCP Configuring pod metallb-speaker/0 active idle 10.1.142.10 7472/TCP Ready

$ juju show-status-log metallb-controller/18 Time Type Status Message 18 Sep 2020 12:23:00-05:00 juju-unit allocating 18 Sep 2020 12:23:00-05:00 workload waiting agent initializing 18 Sep 2020 12:23:19-05:00 workload maintenance installing charm software 18 Sep 2020 12:23:19-05:00 juju-unit executing running install hook 18 Sep 2020 12:23:24-05:00 juju-unit executing running leader-elected hook 18 Sep 2020 12:23:27-05:00 juju-unit executing running config-changed hook 18 Sep 2020 12:23:29-05:00 juju-unit executing running start hook 18 Sep 2020 12:23:33-05:00 juju-unit idle 18 Sep 2020 12:23:34-05:00 juju-unit executing running config-changed hook 18 Sep 2020 12:23:35-05:00 juju-unit allocating Container image "jujusolutions/jujud-operator:2.8.3.152" already present on machine 18 Sep 2020 12:23:35-05:00 juju-unit allocating Created container juju-pod-init 18 Sep 2020 12:23:36-05:00 juju-unit allocating Started container juju-pod-init 18 Sep 2020 12:23:42-05:00 juju-unit idle 18 Sep 2020 12:23:42-05:00 workload maintenance Configuring pod

When using the same code for everything, but only switch the app.status to unit.status, it works as expected :

if response != 0: print('log log') self.model.unit.status = \ BlockedStatus("An error occured during init. Please check the logs.") print('log here too') return response

application-metallb-controller: 12:29:44 ERROR unit.metallb-controller/19.juju-log Exception when calling PolicyV1beta1Api->create_pod_security_policy: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 18 Sep 2020 17:29:44 GMT', 'Content-Length': '426'}) HTTP response body: { "kind": "Status", "apiVersion": "v1", "metadata": {

}, "status": "Failure", "message": "podsecuritypolicies.policy is forbidden: User \"system:serviceaccount:metallb:metallb-controller-operator\" cannot create resource \"podsecuritypolicies\" in API group \"policy\" at the cluster scope", "reason": "Forbidden", "details": { "group": "policy", "kind": "podsecuritypolicies" }, "code": 403 }

application-metallb-controller: 12:29:45 DEBUG unit.metallb-controller/19.start log log application-metallb-controller: 12:29:45 DEBUG unit.metallb-controller/19.start log here too application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Operator Framework 0.9.0 up and running. application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Legacy hooks/config-changed does not exist. application-metallb-controller: 12:29:46 DEBUG unit.metallb-controller/19.juju-log Using controller storage: JUJU_VERSION=2.8.3 application-metallb-controller: 12:29:47 DEBUG unit.metallb-controller/19.juju-log Emitting Juju event config_changed.

crodriguez@camille-ubuntu:~/Projects/metallb-bundle$ juju status Model Controller Cloud/Region Version SLA Timestamp metallb microk8s microk8s/localhost 2.8.3 unsupported 12:30:30-05:00

App Version Status Scale Charm Store Rev OS Address Notes metallb-controller controller:v0.9.3 active 1 metallb-controller local 20 kubernetes 10.152.183.111 metallb-speaker speaker:v0.9.3 active 1 metallb-speaker local 0 kubernetes 10.152.183.211

Unit Workload Agent Address Ports Message metallb-controller/19 blocked idle 10.1.142.43 7472/TCP An error occured during init. Please check the logs. metallb-speaker/0 active idle 10.1.142.10 7472/TCP Ready

crodriguez@camille-ubuntu:~/Projects/metallb-bundle$ juju show-status-log metallb-controller/19 Time Type Status Message 18 Sep 2020 12:29:19-05:00 juju-unit allocating 18 Sep 2020 12:29:19-05:00 workload waiting agent initializing 18 Sep 2020 12:29:37-05:00 workload maintenance installing charm software 18 Sep 2020 12:29:37-05:00 juju-unit executing running install hook 18 Sep 2020 12:29:39-05:00 juju-unit executing running leader-elected hook 18 Sep 2020 12:29:41-05:00 juju-unit executing running config-changed hook 18 Sep 2020 12:29:42-05:00 juju-unit executing running start hook 18 Sep 2020 12:29:44-05:00 workload maintenance Configuring pod 18 Sep 2020 12:29:45-05:00 juju-unit idle 18 Sep 2020 12:29:46-05:00 juju-unit executing running config-changed hook 18 Sep 2020 12:29:47-05:00 juju-unit allocating Created container juju-pod-init 18 Sep 2020 12:29:47-05:00 juju-unit allocating Started container juju-pod-init 18 Sep 2020 12:29:53-05:00 juju-unit idle 18 Sep 2020 12:29:53-05:00 workload blocked An error occured during init. Please check the logs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7KWTIEYQCLT6U4M3FLSGOK4BANCNFSM4RSJM6GA .

camille-rodriguez commented 4 years ago

app status can only be set by leader. So you should do an is_leader check first. I don't know if that is relevant to why it would fail/hang or if you were already checking that elsewhere. John =:->

I check the leader status in the _init__ of my charm class, so all the hooks only run in the leader unit, so it should not be the cause of this issue in this case

jnsgruk commented 2 years ago

I have been experimenting with this issue this morning trying to reproduce.

I suspect the issue was in fact that the leadership check in the charm wasn't quite correctly placed. This check would in fact bail early from the charm's constructor if the unit is not a leader, but would not prevent any hook trying to access self.app.status from firing.

Closing this for now after a brief conversation with @camille-rodriguez.

jameinel commented 2 years ago

So it doesn't prevent a hook from trying to access self.app.status, except it does explicitly stop observing hooks, which means things like _on_start shouldn't be getting called (because it isn't registered as a handler for the install event).

That said, if you happened to defer an event at some point in the past when you were the leader, but then an event happened and you are no longer the leader, that deferred event and the associated handler probably is still registered.

On Thu, Sep 30, 2021 at 9:33 AM Jon Seager @.***> wrote:

I have been experimenting with this issue this morning trying to reproduce.

I suspect the issue was in fact that the leadership check in the charm wasn't quite correctly placed https://github.com/charmed-kubernetes/metallb-operator/blob/ffd28a76c2584dad632fb5b7dc0e086a5b9b8a45/charms/metallb-controller/src/charm.py#L34. This check would in fact bail early from the charm's constructor if the unit is not a leader, but would not prevent any hook trying to access self.app.status from firing.

Closing this for now after a brief conversation with @camille-rodriguez https://github.com/camille-rodriguez.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/409#issuecomment-931327134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7POL2DI2TYGTAJ2XSDUERRJVANCNFSM4RSJM6GA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.