canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
244 stars 119 forks source link

Actions: errors are captured to debug-log and are not in stderr #1078

Closed SimonRichardson closed 10 months ago

SimonRichardson commented 11 months ago

Setting stdout, stderr in event.set_results, always ends up with a failed action, but it never explains why it failed.

Consider the following charm:

import logging

import ops

# Log messages can be retrieved using juju debug-log
logger = logging.getLogger(__name__)

VALID_LOG_LEVELS = ["info", "debug", "warning", "error", "critical"]

class ActionTestCharm(ops.CharmBase):
    """Charm the service."""

    def __init__(self, *args):
        super().__init__(*args)
        self.framework.observe(self.on.fail_action_action, self._on_fail_action_action)
        self.framework.observe(self.on.success_action_action, self._on_success_action_action)

    def _on_fail_action_action(self, event):
        event.fail("Fail action failed")

    def _on_success_action_action(self, event):
        event.log("Success action completed successfully")
        event.set_results({"stdout":"fail me"})

if __name__ == "__main__":  # pragma: nocover
    ops.main(ActionTestCharm)  # type: ignore

This will always fail (against juju 3.1).

juju run action-test/leader success-action
Running operation 69 with 1 task
  - task 70 on unit-action-test-0

Waiting for task 70...
08:01:18 Success action completed successfully

ERROR the following task failed:
 - id "70" with return code 1

use 'juju show-task' to inspect the failure

If we run the same with bash as our invocation hook rather than python.

#!/bin/bash
action-set stdout="something something something"

echo Hello
exit 0

We get the following:

juju run bash-test/0 hello
Running operation 103 with 1 task
  - task 104 on unit-bash-test-0

Waiting for task 104...
08:25:21 Hello

Goodbye cruel world
ERROR cannot set reserved action key "stdout"

I'd expect it to be the same as bash, but clearly, something isn't quite right.

Addendum:

Setting the following in bash does reveal the same results as the framework. I guess the problem is framework doesn't expose why it failed, just that it did.

#!/bin/bash
set -eux
action-set stdout="something something something"

echo Hello
exit 0
tonyandrewmeyer commented 11 months ago

juju debug-log should show something like this:

unit-testing-actions-0: 22:49:56 ERROR unit.testing-actions/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2838, in _run
    result = subprocess.run(args, **kwargs)  # type: ignore
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-testing-actions-0/action-set', 'stdout=hellow')' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 102, in <module>
    ops.main(ActionsTestingCharm)  # type: ignore
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 456, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 342, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 839, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 928, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 44, in _on_simple_action
    event.set_results({"stdout": "hellow"})
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 173, in set_results
    self.framework.model._backend.action_set(results)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3081, in action_set
    self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2840, in _run
    raise ModelError(e.stderr)
ops.model.ModelError: ERROR cannot set reserved action key "stdout"

Is your suggestion that ops should make it so that juju show-task {id} shows some of that? Or that it's actually in the output from the juju run command? What does Juju allow for here?

I guess if we did not capture the output (and redirect it to the Juju debug log) then Juju would capture it, put it in the reserved action result fields and then the Juju client would show it. Something like:

def except_hook(etype: typing.Type[BaseException],
                    value: BaseException,
                    tb: types.TracebackType):
        logger.error(
            "Uncaught exception while in charm code:",
            exc_info=(etype, value, tb))
        if this_is_inside_an_action_hook:
            print(value, file=sys.stderr)
SimonRichardson commented 11 months ago

Yeah, I'm fine with debug-log.

jameinel commented 11 months ago

Ops seems to just be doing: flat_results = _format_action_result_dict(results) self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])

where 'format_action" is just translating:

{'a': {'b': 1, "c": 2}} into {'a.b': 1, 'a.c': 2}

So there doesn't seem to be any reason that it would be doing something weird with action-set. Other than

action-set "stdout=something something something" instead of action-set stdout="something something something"

On Thu, Nov 23, 2023 at 3:31 AM Simon Richardson @.***> wrote:

Setting stdout, stderr in event.set_results, always ends up with a failed action no matter what.

Consider the following charm:

import logging import ops

Log messages can be retrieved using juju debug-loglogger = logging.getLogger(name)

VALID_LOG_LEVELS = ["info", "debug", "warning", "error", "critical"]

class ActionTestCharm(ops.CharmBase): """Charm the service."""

def __init__(self, *args):
    super().__init__(*args)
    self.framework.observe(self.on.fail_action_action, self._on_fail_action_action)
    self.framework.observe(self.on.success_action_action, self._on_success_action_action)

def _on_fail_action_action(self, event):
    event.fail("Fail action failed")

def _on_success_action_action(self, event):
    event.log("Success action completed successfully")
    event.set_results({"stdout":"fail me"})

if name == "main": # pragma: nocover ops.main(ActionTestCharm) # type: ignore

This will always fail (against juju 3.1).

juju run action-test/leader success-action Running operation 69 with 1 task

  • task 70 on unit-action-test-0

Waiting for task 70... 08:01:18 Success action completed successfully

ERROR the following task failed:

  • id "70" with return code 1

use 'juju show-task' to inspect the failure

If we run the same with bash as our invocation hook rather than python.

!/bin/bash

action-set stdout="something something something" echo Helloexit 0

We get the following:

juju run bash-test/0 hello Running operation 103 with 1 task

  • task 104 on unit-bash-test-0

Waiting for task 104... 08:25:21 Hello

Goodbye cruel world ERROR cannot set reserved action key "stdout"

I'd expect it to be the same as bash, but clearly something isn't quite right.

— Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/1078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7NHJNNQJKDAEKKVFI3YF4CXPAVCNFSM6AAAAAA7XMKDD2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDONRZHE4TANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tonyandrewmeyer commented 11 months ago

@benhoyt and I had a chat with @wallyworld about this in/after the Juju APAC standup today - we feel that it's worth opening.

The difference with actions vs. other events is that there's a direct communication channel with a user (particularly if the action is not run in the background, as is the default now). This means that we have somewhere to provide output other than debug-log.

There were three issues raised here:

tonyandrewmeyer commented 11 months ago

@benhoyt I think we agree that it's best for the traceback to always go to debug-log, and for actions to go to stderr as well. We can use os.environ.get("JUJU_ACTION_NAME) where this has this_is_inside_an_action_hook:

def except_hook(etype: typing.Type[BaseException],
                    value: BaseException,
                    tb: types.TracebackType):
        logger.error(
            "Uncaught exception while in charm code:",
            exc_info=(etype, value, tb))
        if this_is_inside_an_action_hook:
            print(value, file=sys.stderr)

What do you think the stderr output should be? The above is just the value, e.g. 1/0 would only have:

division by zero

We could have it be a little more detailed, e.g.:

ZeroDivisionError: division by zero

Or the full traceback (maybe chained), e.g.:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 102, in <module>
    ops.main(ActionsTestingCharm)  # type: ignore
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 456, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 342, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 839, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 928, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 44, in _on_simple_action
    event.set_results(self.model._backend._run('relation-ids', "loki", return_output=True, use_json=True))
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 173, in set_results
    self.framework.model._backend.action_set(results)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3080, in action_set
    flat_results = _format_action_result_dict(results)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2766, in _format_action_result_dict
    for key, value in input.items():
ZeroDivisionError: division by zero

Or we could has something in between that points people towards the debug-log, e.g.:

Uncaught exception while in charm code: ZeroDivisionError: division by zero

Use `juju debug-log` to see more details.
benhoyt commented 11 months ago

Thanks @tonyandrewmeyer. My first thought is we should print the full traceback, as that's what normally goes to stderr when an unhandled exception occurs. However, I quite like the bottom version too. Can you show what the two variants would look like as juju run output?

benhoyt commented 11 months ago

For the environment variable parsing, we may want to consider something like Leon's proposal here so that environment variable parsing happens in one place, rather than scattered throughout main.py (and sometimes other places).

tonyandrewmeyer commented 10 months ago

Full traceback:

tameyer@tam-canoncial-1:~/code/testing-actions$ juju run testing-actions/0 simple
Running operation 263 with 1 task
  - task 264 on unit-testing-actions-0

Waiting for task 264...
Uncaught exception while in charm code:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2946, in _run
    result = subprocess.run(args, **kwargs)  # type: ignore
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-testing-actions-0/action-set', "stdout=can't do this")' returned non-zero exit status 2.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 115, in <module>
    ops.main(ActionsTestingCharm)  # type: ignore
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 451, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 436, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 340, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 842, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 931, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 56, in _on_simple_action
    event.set_results({"stdout": "can't do this"})
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 195, in set_results
    self.framework.model._backend.action_set(results)
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3189, in action_set
    self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
  File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2948, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot set reserved action key "stdout"

ERROR the following task failed:
 - id "264" with return code 1

use 'juju show-task' to inspect the failure

Minimal:

tameyer@tam-canoncial-1:~/code/testing-actions$ juju run testing-actions/0 simple
Running operation 269 with 1 task
  - task 270 on unit-testing-actions-0

Waiting for task 270...
Uncaught exception while in charm code:
ModelError: ERROR cannot set reserved action key "stdout"

Use `juju debug-log` to see the full traceback.
ERROR the following task failed:
 - id "270" with return code 1

use 'juju show-task' to inspect the failure

Note that in both examples the ERROR in the message comes from the Juju response. Obviously the exact formatting of either could be tweaked (except it seems like Juju trims any leading and trailing whitespace in the default format).

Other formats:

tameyer@tam-canoncial-1:~/code/testing-actions$ juju run --format=yaml testing-actions/0 simple
Running operation 271 with 1 task
  - task 272 on unit-testing-actions-0

Waiting for task 272...
testing-actions/0: 
  id: "272"
  message: exit status 1
  results: 
    return-code: 1
    stderr: |2+
      Uncaught exception while in charm code:
      ModelError: ERROR cannot set reserved action key "stdout"
      Use `juju debug-log` to see the full traceback.
  status: failed
  timing: 
    completed: 2023-11-28 22:26:14 +1300 NZDT
    enqueued: 2023-11-28 22:26:13 +1300 NZDT
    started: 2023-11-28 22:26:13 +1300 NZDT
  unit: testing-actions/0
ERROR the following task failed:
 - id "272" with return code 1

use 'juju show-task' to inspect the failure
tameyer@tam-canoncial-1:~/code/testing-actions$ juju show-task 272
Uncaught exception while in charm code:
ModelError: ERROR cannot set reserved action key "stdout"

Use `juju debug-log` to see the full traceback.
tameyer@tam-canoncial-1:~/code/testing-actions$ juju show-task 272 --format=json
{"id":"272","message":"exit status 1","results":{"return-code":1,"stderr":"\n\nUncaught exception while in charm code:\nModelError: ERROR cannot set reserved action key \"stdout\"\n\nUse `juju debug-log` to see the full traceback.\n\n\n"},"status":"failed","timing":{"completed":"2023-11-28 22:26:14 +1300 NZDT","enqueued":"2023-11-28 22:26:13 +1300 NZDT","started":"2023-11-28 22:26:13 +1300 NZDT"},"unit":"testing-actions/0"}

I think I prefer the shorter one that directs you to debug-log if you want all the details (but tweaking the layout a little). If the error is more "something unexpected went wrong" than "there's a bug in the charm code" then it feels like it probably gives enough information without overloading the user (typically not the charm author, presumably) with a traceback. However, I'm not super strong on this, and I have feeling people might argue that they want it to behave exactly as a non-ops hook or as if you were running the code locally.

tonyandrewmeyer commented 10 months ago

For the environment variable parsing, we may want to consider something like Leon's proposal here so that environment variable parsing happens in one place, rather than scattered throughout main.py (and sometimes other places).

Sure. So setup_root_logging should probably take a Boolean argument (to_stderr=False? quiet_stderr=True? tee_stderr=False? copy_to_stderr=False?). And main.py can do the environment lookup now and be a lookup on some sort of loaded context if there is a change like that.