Closed SimonRichardson closed 10 months ago
juju debug-log
should show something like this:
unit-testing-actions-0: 22:49:56 ERROR unit.testing-actions/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2838, in _run
result = subprocess.run(args, **kwargs) # type: ignore
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-testing-actions-0/action-set', 'stdout=hellow')' returned non-zero exit status 2.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 102, in <module>
ops.main(ActionsTestingCharm) # type: ignore
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 456, in __call__
return main(charm_class, use_juju_for_storage=use_juju_for_storage)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 441, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 342, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 839, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 928, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 44, in _on_simple_action
event.set_results({"stdout": "hellow"})
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 173, in set_results
self.framework.model._backend.action_set(results)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3081, in action_set
self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2840, in _run
raise ModelError(e.stderr)
ops.model.ModelError: ERROR cannot set reserved action key "stdout"
Is your suggestion that ops should make it so that juju show-task {id}
shows some of that? Or that it's actually in the output from the juju run
command? What does Juju allow for here?
I guess if we did not capture the output (and redirect it to the Juju debug log) then Juju would capture it, put it in the reserved action result fields and then the Juju client would show it. Something like:
def except_hook(etype: typing.Type[BaseException],
value: BaseException,
tb: types.TracebackType):
logger.error(
"Uncaught exception while in charm code:",
exc_info=(etype, value, tb))
if this_is_inside_an_action_hook:
print(value, file=sys.stderr)
Yeah, I'm fine with debug-log.
Ops seems to just be doing: flat_results = _format_action_result_dict(results) self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
where 'format_action" is just translating:
{'a': {'b': 1, "c": 2}} into {'a.b': 1, 'a.c': 2}
So there doesn't seem to be any reason that it would be doing something weird with action-set. Other than
action-set "stdout=something something something" instead of action-set stdout="something something something"
On Thu, Nov 23, 2023 at 3:31 AM Simon Richardson @.***> wrote:
Setting stdout, stderr in event.set_results, always ends up with a failed action no matter what.
Consider the following charm:
import logging import ops
Log messages can be retrieved using juju debug-loglogger = logging.getLogger(name)
VALID_LOG_LEVELS = ["info", "debug", "warning", "error", "critical"]
class ActionTestCharm(ops.CharmBase): """Charm the service."""
def __init__(self, *args): super().__init__(*args) self.framework.observe(self.on.fail_action_action, self._on_fail_action_action) self.framework.observe(self.on.success_action_action, self._on_success_action_action) def _on_fail_action_action(self, event): event.fail("Fail action failed") def _on_success_action_action(self, event): event.log("Success action completed successfully") event.set_results({"stdout":"fail me"})
if name == "main": # pragma: nocover ops.main(ActionTestCharm) # type: ignore
This will always fail (against juju 3.1).
juju run action-test/leader success-action Running operation 69 with 1 task
- task 70 on unit-action-test-0
Waiting for task 70... 08:01:18 Success action completed successfully
ERROR the following task failed:
- id "70" with return code 1
use 'juju show-task' to inspect the failure
If we run the same with bash as our invocation hook rather than python.
!/bin/bash
action-set stdout="something something something" echo Helloexit 0
We get the following:
juju run bash-test/0 hello Running operation 103 with 1 task
- task 104 on unit-bash-test-0
Waiting for task 104... 08:25:21 Hello
Goodbye cruel world ERROR cannot set reserved action key "stdout"
I'd expect it to be the same as bash, but clearly something isn't quite right.
— Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/1078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7NHJNNQJKDAEKKVFI3YF4CXPAVCNFSM6AAAAAA7XMKDD2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDONRZHE4TANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@benhoyt and I had a chat with @wallyworld about this in/after the Juju APAC standup today - we feel that it's worth opening.
The difference with actions vs. other events is that there's a direct communication channel with a user (particularly if the action is not run in the background, as is the default now). This means that we have somewhere to provide output other than debug-log.
There were three issues raised here:
ops
allows event.set_results({"stdout": value})
(and stderr
, etc). This is somewhat true - ops
allows it but juju
doesn't, so ops
captures the error from juju
and raises that back to the user (as a ModelError
). The charm can't actually set a result in any of the forbidden keys. We could duplicate this restriction in ops
(harness
already has it) but that's not the general pattern and it would mean we need to keep the restrictions in sync. So we don't feel this should change.event.fail()
which is something else) then that error should be reflected back to Juju so it can tell the user. This is already the case - the Python code will exit (if the Charm code has a bug, for example) with 1. So this already works. (The original case was not error checking in bash, but the description has already been updated to reflect that).stdout
as a key is just one example), it's not clear to users where to find more information (when the Charm uses ops
). The output says it failed and that you can get info from juju show-task
but there's nothing there. You need to do juju debug-log
, which is where you'll find the exception. This happens because ops
has a global exception hook that replaces writing the exception traceback to stderr with sending it to juju's debug-log. We feel that it is worth improving this.@benhoyt I think we agree that it's best for the traceback to always go to debug-log, and for actions to go to stderr as well. We can use os.environ.get("JUJU_ACTION_NAME)
where this has this_is_inside_an_action_hook
:
def except_hook(etype: typing.Type[BaseException], value: BaseException, tb: types.TracebackType): logger.error( "Uncaught exception while in charm code:", exc_info=(etype, value, tb)) if this_is_inside_an_action_hook: print(value, file=sys.stderr)
What do you think the stderr output should be? The above is just the value, e.g. 1/0
would only have:
division by zero
We could have it be a little more detailed, e.g.:
ZeroDivisionError: division by zero
Or the full traceback (maybe chained), e.g.:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 102, in <module>
ops.main(ActionsTestingCharm) # type: ignore
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 456, in __call__
return main(charm_class, use_juju_for_storage=use_juju_for_storage)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 441, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 342, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 839, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 928, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 44, in _on_simple_action
event.set_results(self.model._backend._run('relation-ids', "loki", return_output=True, use_json=True))
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 173, in set_results
self.framework.model._backend.action_set(results)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3080, in action_set
flat_results = _format_action_result_dict(results)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2766, in _format_action_result_dict
for key, value in input.items():
ZeroDivisionError: division by zero
Or we could has something in between that points people towards the debug-log, e.g.:
Uncaught exception while in charm code: ZeroDivisionError: division by zero
Use `juju debug-log` to see more details.
Thanks @tonyandrewmeyer. My first thought is we should print the full traceback, as that's what normally goes to stderr when an unhandled exception occurs. However, I quite like the bottom version too. Can you show what the two variants would look like as juju run
output?
For the environment variable parsing, we may want to consider something like Leon's proposal here so that environment variable parsing happens in one place, rather than scattered throughout main.py (and sometimes other places).
Full traceback:
tameyer@tam-canoncial-1:~/code/testing-actions$ juju run testing-actions/0 simple
Running operation 263 with 1 task
- task 264 on unit-testing-actions-0
Waiting for task 264...
Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2946, in _run
result = subprocess.run(args, **kwargs) # type: ignore
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-testing-actions-0/action-set', "stdout=can't do this")' returned non-zero exit status 2.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 115, in <module>
ops.main(ActionsTestingCharm) # type: ignore
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 451, in __call__
return main(charm_class, use_juju_for_storage=use_juju_for_storage)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 436, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 340, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 842, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/framework.py", line 931, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/./src/charm.py", line 56, in _on_simple_action
event.set_results({"stdout": "can't do this"})
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/charm.py", line 195, in set_results
self.framework.model._backend.action_set(results)
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 3189, in action_set
self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
File "/var/lib/juju/agents/unit-testing-actions-0/charm/venv/ops/model.py", line 2948, in _run
raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot set reserved action key "stdout"
ERROR the following task failed:
- id "264" with return code 1
use 'juju show-task' to inspect the failure
Minimal:
tameyer@tam-canoncial-1:~/code/testing-actions$ juju run testing-actions/0 simple
Running operation 269 with 1 task
- task 270 on unit-testing-actions-0
Waiting for task 270...
Uncaught exception while in charm code:
ModelError: ERROR cannot set reserved action key "stdout"
Use `juju debug-log` to see the full traceback.
ERROR the following task failed:
- id "270" with return code 1
use 'juju show-task' to inspect the failure
Note that in both examples the ERROR
in the message comes from the Juju response. Obviously the exact formatting of either could be tweaked (except it seems like Juju trims any leading and trailing whitespace in the default format).
Other formats:
tameyer@tam-canoncial-1:~/code/testing-actions$ juju run --format=yaml testing-actions/0 simple
Running operation 271 with 1 task
- task 272 on unit-testing-actions-0
Waiting for task 272...
testing-actions/0:
id: "272"
message: exit status 1
results:
return-code: 1
stderr: |2+
Uncaught exception while in charm code:
ModelError: ERROR cannot set reserved action key "stdout"
Use `juju debug-log` to see the full traceback.
status: failed
timing:
completed: 2023-11-28 22:26:14 +1300 NZDT
enqueued: 2023-11-28 22:26:13 +1300 NZDT
started: 2023-11-28 22:26:13 +1300 NZDT
unit: testing-actions/0
ERROR the following task failed:
- id "272" with return code 1
use 'juju show-task' to inspect the failure
tameyer@tam-canoncial-1:~/code/testing-actions$ juju show-task 272
Uncaught exception while in charm code:
ModelError: ERROR cannot set reserved action key "stdout"
Use `juju debug-log` to see the full traceback.
tameyer@tam-canoncial-1:~/code/testing-actions$ juju show-task 272 --format=json
{"id":"272","message":"exit status 1","results":{"return-code":1,"stderr":"\n\nUncaught exception while in charm code:\nModelError: ERROR cannot set reserved action key \"stdout\"\n\nUse `juju debug-log` to see the full traceback.\n\n\n"},"status":"failed","timing":{"completed":"2023-11-28 22:26:14 +1300 NZDT","enqueued":"2023-11-28 22:26:13 +1300 NZDT","started":"2023-11-28 22:26:13 +1300 NZDT"},"unit":"testing-actions/0"}
I think I prefer the shorter one that directs you to debug-log if you want all the details (but tweaking the layout a little). If the error is more "something unexpected went wrong" than "there's a bug in the charm code" then it feels like it probably gives enough information without overloading the user (typically not the charm author, presumably) with a traceback. However, I'm not super strong on this, and I have feeling people might argue that they want it to behave exactly as a non-ops hook or as if you were running the code locally.
For the environment variable parsing, we may want to consider something like Leon's proposal here so that environment variable parsing happens in one place, rather than scattered throughout main.py (and sometimes other places).
Sure. So setup_root_logging
should probably take a Boolean argument (to_stderr=False
? quiet_stderr=True
? tee_stderr=False
? copy_to_stderr=False
?). And main.py
can do the environment lookup now and be a lookup on some sort of loaded context if there is a change like that.
Setting stdout, stderr in event.set_results, always ends up with a failed action, but it never explains why it failed.
Consider the following charm:
This will always fail (against juju 3.1).
If we run the same with bash as our invocation hook rather than python.
We get the following:
I'd expect it to be the same as bash, but clearly, something isn't quite right.Addendum:
Setting the following in bash does reveal the same results as the framework. I guess the problem is framework doesn't expose why it failed, just that it did.