canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
246 stars 119 forks source link

juju_log - potential problem with long log messages #531

Closed mkalcok closed 3 years ago

mkalcok commented 3 years ago

I encountered problem where _ModelBackend.juju_log() threw an exception while logging message from underlying python modules. This happened to me when I was using kubernetes python client to create CRDs from yaml.

unit-cert-manager-0: 15:27:30 ERROR unit.cert-manager/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 113, in <module>
    main(CertManagerCharm)
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/main.py", line 404, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/framework.py", line 732, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 44, in _on_install
    create_from_yaml(k8s_client, crd_file, namespace=self.model.name)
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/utils/create_from_yaml.py", line 69, in create_from_yaml
    created = create_from_dict(k8s_client, yml_document, verbose,
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/utils/create_from_yaml.py", line 129, in create_from_dict
    created = create_from_yaml_single_item(
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/utils/create_from_yaml.py", line 171, in create_from_yaml_single_item
    resp = getattr(k8s_api, "create_{0}".format(kind))(
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/api/apiextensions_v1_api.py", line 65, in create_custom_resource_definition
    return self.create_custom_resource_definition_with_http_info(body, **kwargs)  # noqa: E501
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/api/apiextensions_v1_api.py", line 152, in create_custom_resource_definition_with_http_info
    return self.api_client.call_api(
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/rest.py", line 274, in POST
    return self.request("POST", url,
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/kubernetes/client/rest.py", line 230, in request
    logger.debug("response body: %s", r.data)
  File "/usr/lib/python3.8/logging/__init__.py", line 1422, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib/python3.8/logging/__init__.py", line 1577, in _log
    self.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1587, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1649, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 950, in handle
    self.emit(record)
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/log.py", line 34, in emit
    self.model_backend.juju_log(record.levelname, self.format(record))
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/model.py", line 1494, in juju_log
    self._run('juju-log', '--log-level', level, "--", message)
  File "/var/lib/juju/agents/unit-cert-manager-0/charm/venv/ops/model.py", line 1284, in _run
    result = run(args, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/var/lib/juju/tools/unit-cert-manager-0/juju-log'
unit-cert-manager-0: 15:27:30 ERROR juju.worker.uniter.operation hook "install" (via hook dispatching script: dispatch) failed: exit status 1

For reference: this is the CRD file I was deploying. After setting loglevel for kubernetes module to INFO, to avoid that logger.debug() call, everything started to work again.

stub42 commented 3 years ago

OS command line length limits are being violated, per https://bugs.launchpad.net/bugs/1274460

pengale commented 3 years ago

It looks like the Juju core bug is pretty old, and has been deprioritized.

Should we just truncate overly long lines in the operator framework? That might make the code a bit less pretty (we'll have to separate the construction of the logging string from the actual call out to the system), but I think that it might be the right thing to do.

pengale commented 3 years ago

Researching this a bit, it looks like the problem may be less with the maximum length of a line in bash (getconfig ARG_MAX), and more with the max length of a single argument passed to bash, which is often 131071. (Painstaking discussion here: https://www.in-ulm.de/~mascheck/various/argmax/)

I'm a little bit tempted to make a change like this:

--- a/ops/model.py
+++ b/ops/model.py
@@ -38,6 +38,8 @@ from ops.jujuversion import JujuVersion

 logger = logging.getLogger(__name__)

+MAX_STR_LEN = 131071
+

 class Model:
     """Represents the Juju Model as seen from this unit.
@@ -1589,7 +1591,7 @@ class _ModelBackend:
         self._run('application-version-set', '--', version)

     def juju_log(self, level, message):
-        self._run('juju-log', '--log-level', level, "--", message)
+        self._run('juju-log', '--log-level', level, "--", message[:MAX_STR_LEN])

     def network_get(self, binding_name, relation_id=None):
         """Return network info provided by network-get for a given binding.

Basically, we just truncate any log messages that are longer than 131071. (And hope that the addition of quotes or whatever doesn't break things -- we could lop off an additional ten characters or so if we were paranoid.)

What say the rest of you? Sound like a decent plan?

mkalcok commented 3 years ago

How about splitting the message into multiple MAX_STR_LEN-sized strings and calling juju-log multiple times? Maybe prefacing this split output with one more message (with same log-level?) indicating that following message was too long and was split to multiple parts?

pengale commented 3 years ago

@mkalcok Splitting into multiple lines makes sense. It's strangely less satisfying -- I guess that I just like tossing extra stuff away, when I can get away with it. But much more friendly to the person who might need the data from a big log dump :-)

mkalcok commented 3 years ago

@petevg yup..gigantic logs are obviously not ideal but sometimes there's no other way. Especially if the logs come from underlying library (like it was in my case) and charm author has no control over them but needs their output to debug/solve the issue.

jameinel commented 3 years ago

I thought the actual param size was closer to 32kb which is why people trip over it. (especially with env vars, etc). But maybe that is bash and not exec. Arguably Juju shouldn't allow you to be creating log entries that are 100s of KB in size in the first place. (stack traces come to mind as something that is fairly unbounded and can easily cause a problem). It wouldn't be hard on the juju side to support "echo foo | juju-log -l INFO", though you then run into compatibility questions (I'm using juju 2.8 vs 2.9.X that happens to support stdin.)

On Wed, Sep 29, 2021, 17:49 Pete Vander Giessen @.***> wrote:

Researching this a bit, it looks like the problem may be less with the maximum length of a line in bash (getconfig ARG_MAX), and more with the max length of a single argument passed to bash, which is often 131071. (Painstaking discussion here: https://www.in-ulm.de/~mascheck/various/argmax/)

I'm a little bit tempted to make a change like this:

--- a/ops/model.py +++ b/ops/model.py @@ -38,6 +38,8 @@ from ops.jujuversion import JujuVersion

logger = logging.getLogger(name)

+MAX_STR_LEN = 131071 +

class Model: """Represents the Juju Model as seen from this unit. @@ -1589,7 +1591,7 @@ class _ModelBackend: self._run('application-version-set', '--', version)

 def juju_log(self, level, message):
  • self._run('juju-log', '--log-level', level, "--", message)
  • self._run('juju-log', '--log-level', level, "--", message[:MAX_STR_LEN])

    def network_get(self, binding_name, relation_id=None): """Return network info provided by network-get for a given binding.

Basically, we just truncate any log messages that are longer than 131071. (And hope that the addition of quotes or whatever doesn't break things -- we could lop off an additional ten characters or so if we were paranoid.)

What say the rest of you? Sound like a decent plan?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/531#issuecomment-930570203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7N2SOQNZN6FWBINIHLUEOCWBANCNFSM442CMBYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.