canonical / charmed-kubeflow-chisme

Shared Utilities used across Charmed Kubeflow
Apache License 2.0
1 stars 4 forks source link

`ErrorWithStatus` may not be setting the unit status correctly #34

Open DnPlas opened 1 year ago

DnPlas commented 1 year ago

When deploying the kubeflow bundle, at some point I got the following error in the juju debug-log:

unit-training-operator-0: 17:29:52 ERROR unit.training-operator/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 207, in <module>
    main(TrainingOperatorCharm)
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/main.py", line 436, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 866, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 931, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 189, in _on_install
    self._check_container_connection()
  File "./src/charm.py", line 135, in _check_container_connection
    raise ErrorWithStatus("Pod startup is not complete", MaintenanceStatus)
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: Pod startup is not complete

The error suggests the unit should be in MaintenanceStatus, but instead was in ErrorStatus. Although this does not prevent the unit from going to active and idle, this behaviour is not what we are expecting.

Steps to reproduce

  1. Deploy this bundle
  2. Watch the logs for training-operator
  3. Watch the status of training-operator
  4. For a brief moment, the unit is in error status rather than maintenance
ca-scribner commented 1 year ago

I don't know if this is a chisme bug or something to do with how training-operator imported things?

The message comes from here. The charmed_kubeflow_chisme... part in the error is just the type of exception being raised. But then I'd have expected the exception be caught here. I don't recall seeing this happen in other charms, but would be interesting to dig into. Maybe there's something wrong with how chisme lets people import these exceptions?