eht16 / python-logstash-async

Python logging handler for sending log events asynchronously to Logstash.
MIT License
182 stars 52 forks source link

Transport endpoint is not connected #89

Closed calee88 closed 7 months ago

calee88 commented 11 months ago

I'm getting "OSError: [Errno 107] Transport endpoint is not connected" every 30 seconds. It is interesting because, the messages are logged in logstash index. Is it possible that the endpoint is not connected and still logging the message? And other messages seem to be logged as well. Here is the stack trace:

Traceback (most recent call last):
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/worker.py", line 138, in _fetch_events
    self._fetch_event()
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/worker.py", line 160, in _fetch_event
    self._event = self._queue.get(block=False)
  File "/usr/lib/python3.10/queue.py", line 168, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/transport.py", line 101, in send
    self._send(events)
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/transport.py", line 118, in _send
    self._send_via_socket(event)
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/transport.py", line 226, in _send_via_socket
    self._sock.sendall(data_to_send)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/worker.py", line 241, in _flush_queued_events
    self._send_events(events)
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/worker.py", line 304, in _send_events
    self._transport.send(events, use_logging=use_logging)
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/transport.py", line 103, in send
    self._close()
  File "/home/a_id/git/a_project/.venv/lib/python3.10/site-packages/logstash_async/transport.py", line 137, in _close
    self._sock.shutdown(socket.SHUT_WR)
OSError: [Errno 107] Transport endpoint is not connected

function name: _safe_log_impl path: logstash_async/worker.py logstash_async_version: 2.7.2

eht16 commented 11 months ago

Do you see any related log messages on the Logstash server?

Do you know if this happened already before 2.7.0? If possible either downgrade to 2.6.0 or set constants.SOCKET_CLOSE_WAIT_TIMEOUT to 0, whatever is easier for you. In 2.7.0 a new feature has been added to wait for the send buffer to be empty when shutting down. This might be related, though this is just a guess.

OliverNickel commented 10 months ago

I tried this solution by the following lines:

from logstash_async.constants import constants constants.SOCKET_CLOSE_WAIT_TIMEOUT=0

but still getting this error:

File \"/python3.11/lib/python3.11/site-packages/logstash_async/worker.py\", line 241, in _flush_queued_events self._send_events(events) File \"/python3.11/lib/python3.11/site-packages/logstash_async/worker.py\", line 304, in _send_events self._transport.send(events, use_logging=use_logging) File \"/python3.11/lib/python3.11/site-packages/logstash_async/transport.py\", line 103, in send self._close() File \"/python3.11/lib/python3.11/site-packages/logstash_async/transport.py\", line 137, in _close self._sock.shutdown(socket.SHUT_WR)

function: _safe_log_impl name: worker.py original: An error occurred while sending events: [Errno 107] Transport endpoint is not connected type: OSError

Using Debian trixie-slim with sqlite3 installed and python 3.11.6

eht16 commented 10 months ago

Unfortunately, constants.SOCKET_CLOSE_WAIT_TIMEOUT=0 doesn't exactly disable the feature completely. This is fixed in master.

Anyway, I just reread the OP and noticed the second exception "BrokenPipeError: [Errno 32] Broken pipe". I think this is main problem and the "OSError: [Errno 107] Transport endpoint is not connected" is only a follow up error when trying to close the connection. I will probably wrap the connection shutdown code in a try-except block.

Still, this won't solve the broken pipe errors. I'm pretty sure this is not caused by the code but by the network connection on your side.

x0zzz commented 9 months ago

I have the same issue in a docker environment. Interestingly the errors do not stop even when the logstash container gets back up. Is there any workaround until a fix is delivered? Sample logs (this repeats every 30s):

2023-12-23 10:36:22 An error occurred while sending events: [Errno 107] Transport endpoint is not connected 2023-12-23 10:36:22 Traceback (most recent call last): 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/worker.py", line 138, in _fetch_events 2023-12-23 10:36:22 self._fetch_event() 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/worker.py", line 160, in _fetch_event 2023-12-23 10:36:22 self._event = self._queue.get(block=False) 2023-12-23 10:36:22 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/queue.py", line 168, in get 2023-12-23 10:36:22 raise Empty 2023-12-23 10:36:22 _queue.Empty 2023-12-23 10:36:22 2023-12-23 10:36:22 During handling of the above exception, another exception occurred: 2023-12-23 10:36:22 2023-12-23 10:36:22 Traceback (most recent call last): 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/transport.py", line 101, in send 2023-12-23 10:36:22 self._send(events) 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/transport.py", line 118, in _send 2023-12-23 10:36:22 self._send_via_socket(event) 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/transport.py", line 226, in _send_via_socket 2023-12-23 10:36:22 self._sock.sendall(data_to_send) 2023-12-23 10:36:22 BrokenPipeError: [Errno 32] Broken pipe 2023-12-23 10:36:22 2023-12-23 10:36:22 During handling of the above exception, another exception occurred: 2023-12-23 10:36:22 2023-12-23 10:36:22 Traceback (most recent call last): 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/worker.py", line 241, in _flush_queued_events 2023-12-23 10:36:22 self._send_events(events) 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/worker.py", line 304, in _send_events 2023-12-23 10:36:22 self._transport.send(events, use_logging=use_logging) 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/transport.py", line 103, in send 2023-12-23 10:36:22 self._close() 2023-12-23 10:36:22 File "/usr/local/lib/python3.12/site-packages/logstash_async/transport.py", line 137, in _close 2023-12-23 10:36:22 self._sock.shutdown(socket.SHUT_WR) 2023-12-23 10:36:22 OSError: [Errno 107] Transport endpoint is not connected

eht16 commented 8 months ago

I have no workaround yet except restarting the application.

when the logstash container gets back up

This sounds like you know when it happens? Is it just an unavailable Logstash server?

I hope I will find time to fix it next week.

x0zzz commented 8 months ago

This sounds like you know when it happens? Is it just an unavailable Logstash server?

Yes, I control both containers, so I could reset the ones dependent on Logstash. It's a matter of resilience and flexibility I guess ;)

I hope I will find time to fix it next week.

Please do :)

izverev commented 8 months ago

We think this happens because socket not properly closing with _close(), because socket already closed and sending shutdown in closed socket raises OSError, next code, self._sock.close() and self._sock = None not reacheble, and next time when you try to send smth, socket not recreated in _create_socket because it's not empty

We fixed with try...finally for us in out transport:

    try:
        super()._close(force)
    finally:
        self._sock = None

ways for reproduce:

eht16 commented 8 months ago

@izverev thanks for the investigation and steps to reproduce.

I added basically your change in #92. The errors are still thrown when trying to close a broken or not connected socket connection but it will recover the next time as the socket instance is dropped instead of re-used.

@izverev @x0zzz @OliverNickel @calee88 it would be great if you could test it.

OliverNickel commented 8 months ago

I can confirm that it works - more or less. It still keeps spamming the error message

An error occurred while sending events: [Errno 107] Transport endpoint is not connected

  File "/python3.12/lib/python3.12/site-packages/logstash_async/worker.py", line 241, in _flush_queued_events
    self._send_events(events)
  File "/python3.12/lib/python3.12/site-packages/logstash_async/worker.py", line 304, in _send_events
    self._transport.send(events, use_logging=use_logging)
  File "/python3.12/lib/python3.12/site-packages/logstash_async/transport.py", line 103, in send
    self._close()
  File "/python3.12/lib/python3.12/site-packages/logstash_async/transport.py", line 138, in _close
    self._sock.shutdown(socket.SHUT_WR)

Maybe because we are using UDP?

logstash_async.transport.UdpTransport

eht16 commented 8 months ago

I can confirm that it works - more or less. It still keeps spamming the error message

An error occurred while sending events: [Errno 107] Transport endpoint is not connected

I did not suppress the error only the handling of it so that the Transport instance will recover from the error by creating a new socket connection. My idea was that it is generally better to log these errors than hiding them away. Do this happen so often that it bothers you or did you just notice while testing?

Maybe because we are using UDP?

I think UDP/TCP does not matter here, the code is the same for both protocols.