OSError: [Errno 90] Message too long

anxstj commented 4 years ago

ISSUE TYPE

Bug Report

SUMMARY

The awx_task container logs a traceback:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

I stumbled on it while examining the logs and found it worth reporting.

ENVIRONMENT

AWX version: 11.2.0
AWX install method:docker
Ansible version: 2.9.7
Operating System: Debian 10
Web Browser: Firefox 75

STEPS TO REPRODUCE

Actually I don't know how to reproduce this.

EXPECTED RESULTS

No traceback.

ACTUAL RESULTS

A traceback is logged.

ADDITIONAL INFORMATION

2020-06-08 06:00:24,531 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
2020-06-08 06:00:25,901 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2020-06-08 06:00:25,901 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
2020-06-08 06:00:26,150 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,150 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
2020-06-08 06:00:26,156 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,156 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,174 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)

ryanpetrello commented 4 years ago

I'm betting this is due to our usage of a local domain socket and max packet size for SOCK_DGRAM. Probably you've got events periodically that are really large (fact data, perhaps).

ryanpetrello commented 4 years ago

One approach here is to just use TCP, but we didn't do that because we didn't want to have to deal with connection security.

Maybe we could detect this type of situation and automatically attempt to truncate the message? Though, I could see that upsetting external log aggregators that relied on e.g., valid JSON. Or perhaps we could chunk the message ourselves.

hjkatz commented 3 years ago

We are also getting these stacktraces and error messages in our logs. I'm worried that some logging information and job events are being dropped because of this. We receive this message when syncing our shared ansible repo project which involves a full delete/clone of the git repo and many many files.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 154, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 32, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 59, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 241, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 253, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 78, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 99, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 191, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 178, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 437, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 392, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 30, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()

I'm dubious that "Event data saved" actually happened 😬

wenottingham commented 3 years ago

@hjkatz on a whim - what happens if you set MAX_EVENT_RES=60000 in AWX_TASK_ENV (or MAX_EVENT_RES_DATA in the deployment settings)

oweel commented 3 years ago

Maybe should write the JobEvent logs to a file and configure RSysLog to send them to the log aggregator? I think this should fix the problem. Or is this a bad idea?

dommoly commented 2 years ago

Was there ever a solution found for this? I'm seeing it in v15 and log data is definitely being dropped.

pfeifferj commented 1 year ago

setting this as a workaround helped in my case: https://github.com/ansible/awx-operator#no-log

lals1 commented 11 months ago

Hi @pfeifferj What's the workaround that has helped you? I can't see anything specific in that link.

setting this as a workaround helped in my case: https://github.com/ansible/awx-operator#no-log

pfeifferj commented 11 months ago

@lals1 I vaguely remember that the playbooks generated a lot of logs, which filled up /var/log/ and led to AAP execution pods dying. setting the no-log option in the operator helped. iirc, there was also a bug in OpenShift/kubernetes -- which was fixed in ocp 4.10.16. this bug made the logging issue worse with the AAP/AWX operator. scheduling the AAP/AWX operator workloads on its dedicated nodes also helped.

hope this helps!

lals1 commented 11 months ago

Thanks for the response @pfeifferj

In my case, this issue started when I upgraded AWX version from 21.0.0 to 23.2.0. It was working fine earlier. We use an external log aggregator with type logstash for pushing only AWX-related logs.

Just wondering what has changed in between those AWX versions that rsyslog is now complaining about message too long 🤔

rev138 commented 1 week ago

This is still an issue using AWX 24.6.1, preventing us from shipping package facts to Splunk -- does anyone have a solution?

ansible / awx