Broken Pipe when Sending TCP Data to Loggly

brianwallace commented 9 years ago

I am using logspout to stream docker logs into Loggly over TCP. This is done using the syslog+tcp://logs-01.loggly.com endpoint:

    /usr/bin/docker run --name logspout \
    -v=/var/run/docker.sock:/tmp/docker.sock \
    -e SYSLOG_STRUCTURED_DATA=[LOGGLY_CUSTOMER_TOKEN]@41058  \
    gliderlabs/logspout:master \
    syslog+tcp://logs-01.loggly.com:514

Way too frequently, however, I am seeing a broken pipe message and data streaming from logspout stops. From the logspout logs:

2015/06/02 01:04:07 syslog: write tcp 54.236.68.122:514: broken pipe

I am assuming this occurs when the Loggly Route 53 ingestion endpoint changes IP and the TCP connection is lost.

Logspout should be updated to attempt to reestablish this connection if it is lost.

rlister commented 9 years ago

This is definitely not a loggly-specific problem. It seems the new adapter code causes this problem both for TCP and UDP (surprisingly). I am writing to a local syslog endpoint: with v2 or latest (i.e. release branch) I can write UDP, stop the endpoint, restart it, and logspout carries on writing. With master, using either TCP or UDP, if I restart the endpoint the connection is dropped and not recreated.

brianwallace commented 9 years ago

@rlister, thanks for the feedback. I have updated to logspout v2 and will give this a shot. I should be able to confirm in a couple days whether or not this solves my issue (it sounds like it will!).

fastmap22 commented 9 years ago

Any updates on this. I am getting this error too with (raw+tcp)

logspout v3-master by gliderlabs

raw: write tcp :17000: broken pipe

brianwallace commented 9 years ago

FYI, I was not able to go back to a previous release v2 or latest (i.e. release branch) because it does not support the new endpoint format (to the best of my knowledge):

for example: syslog+tcp://logs-01.loggly.com:514

fastmap22 commented 9 years ago

Any updates on this?

strzelecki-maciek commented 9 years ago

would love to see this fixed as well. Currently logspout is not trying to reconnect after a broken pipe err, which effectively makes it very unreliable

arabold commented 9 years ago

Is there any workaround for this problem? Once the pipe breaks, I have to manually log in to my instance(s) and restart the logspout container. Not really an acceptable solution...

josh-padnick commented 9 years ago

We're also experiencing this. Do the adapters themselves need to be rewritten or is this a logspout core problem?

tmornini commented 9 years ago

:+1: Have seen this as well...

schickling commented 9 years ago

+1

bgoldman commented 9 years ago

:+1:

tmornini commented 8 years ago

Just saw this again. Not having continuous logs sucks. :-(

ben commented 8 years ago

:+1: from me, experiencing this in production, and our only workaround is a cronjob that restarts the logspout container every hour. :cry:

josh-padnick commented 8 years ago

I just want to say that after investing a significant amount of time in setting up logspout, we opted to use fluentd with a customized version of the fluent-plugin-docker-format. This automatically picks up all new Docker containers, is fault-tolerant (if fluentd fails, it just picks up reading off of the docker log files where it picked off), performant, and as customizable as the fluentd plugin ecosystem.

While I like some of the core ideas behind logspout, I believe better solutions are now available.

tmornini commented 8 years ago

Just saw this again.

Is this project being maintained?

progrium commented 8 years ago

Sometimes! I would love help. Can somebody confirm #150 fixes this?

keis commented 8 years ago

I followed the approach from #150 to create a patched version of the logspout-logstash module for which I was seeing the same issue. It fixed the issue in that instance at least.

progrium commented 8 years ago

One more thumbs up?

tmornini commented 8 years ago

@progrium Wonderful to hear. I'd love to verify #150 but don't have the Go chops (yet!) to handle that.

meonkeys commented 8 years ago

Suggestion / alternative to #150: instead of logging broken pipe and continuing execution, just exit (with whatever exit code makes sense). Docker and Kubernetes can restart the container automatically, and will keep a count of restarts.

tmornini commented 8 years ago

@meonkeys I like the cleanliness of your method, but it concerns me that logs may be lost during the switchover.

progrium commented 8 years ago

Yes, normally I'm all about fail fast and letting higher up supervisors restart, but that would mean potential lost logs. If it couldn't reconnect after a few tries, I'm happy to have it fail out.

On Fri, Jan 15, 2016 at 7:21 PM, Tom Mornini notifications@github.com wrote:

@meonkeys https://github.com/meonkeys I like the cleanliness of this, but it concerns me that logs may be lost during the switchover.

— Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-172142561 .

Jeff Lindsay http://progrium.com

meonkeys commented 8 years ago

Ah, got it, that makes sense. I assumed that k8s (or something else) was keeping logs safe/durable.

tmornini commented 8 years ago

@meonkeys k8s?

meonkeys commented 8 years ago

Sorry, k8s is just shorthand for Kubernetes.

tmornini commented 8 years ago

@progrium Any chance of cutting a release so I can give you some feedback?

crohr commented 8 years ago

I'd love to see this supported for raw TCP connections as well, instead of just syslog.

rsingh2411 commented 8 years ago

Hi currently tcp support for syslog servers is not working or is it work in progress. I am trying to send container logs to syslog server through following url syslog+tcp://:port. the logs are not going. but when i switch to udp ,i can see that logs start flowing into syslog, i require tcp with logspout. to send logs to syslog server, is it possible, Please reply

hlj commented 8 years ago

:+1:

tmornini commented 8 years ago

@progrium Here's a report after 17 hours on v3

We deployed to 4 servers 17 hours ago ith syslog+tcp://logs-01.loggly.com:514

3 of the servers are still logging, 1 stopped logging entirely about 8.5 hours ago.

We ran docker logs on the Loggly container that has stopped logging and checked it's log:

# logspout v3 by gliderlabs
# adapters: tcp syslog raw tls udp
# options : persist:/mnt/routes
# jobs    : http[logs,routes]:80 pump
# routes  :
#   ADAPTER ADDRESS         CONTAINERS  SOURCES OPTIONS
#   syslog+tcp  logs-01.loggly.com:514              map[]

We checked logs on the rest of the containers and they've been logging the entire time.

So, it's a total mystery as to why the logging has stopped. Based upon previous behavior, we expect the rest of the servers will follow this one to the same condition eventually.

progrium commented 8 years ago

Okay. I have some ideas and at the very least know some simple additions for better debugging. Anybody willing to sponsor this work?

On Thu, Mar 17, 2016 at 11:44 AM, Tom Mornini notifications@github.com wrote:

@progrium https://github.com/progrium Here's a report after 17 hours on v3

We deployed to 4 servers 17 hours ago ith syslog+tcp:// logs-01.loggly.com:514

3 of the servers are still logging, 1 stopped logging entirely about 8.5 hours ago.

We ran docker logs on the Loggly container that has stopped logging and checked it's log:

logspout v3 by gliderlabs

adapters: tcp syslog raw tls udp

options : persist:/mnt/routes

jobs : http[logs,routes]:80 pump

routes :

ADAPTER ADDRESS CONTAINERS SOURCES OPTIONS

syslog+tcp logs-01.loggly.com:514 map[]

We checked logs on the rest of the containers and they've been logging the entire time.

So, it's a total mystery as to why the logging has stopped. Based upon previous behavior, we expect the rest of the servers will follow this one to the same condition eventually.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-197968968

Jeff Lindsay http://progrium.com

tmornini commented 8 years ago

@progrium Yes. I assume you accept Bitcoin?

progrium commented 8 years ago

I ... well I haven't, but I guess I could?

On Fri, Mar 18, 2016 at 1:51 AM, Tom Mornini notifications@github.com wrote:

@progrium https://github.com/progrium Yes. I assume you accept Bitcoin?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-198231448

Jeff Lindsay http://progrium.com

tmornini commented 8 years ago

@progrium Ok, let's negotiate privately and get this underway!

progrium commented 8 years ago

my username at gmail if you lost my email. :)

On Mon, Mar 21, 2016 at 8:00 PM, Tom Mornini notifications@github.com wrote:

@progrium https://github.com/progrium Ok, let's negotiate privately and get this underway!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-199567239

Jeff Lindsay http://progrium.com

megastef commented 8 years ago

Please have a look to https://github.com/sematext/sematext-agent-docker it re-connects when the Logsene backend is not reachable, it buffers and re-transmit logs, which could not be shipped. If the connection to Docker Socket is lost, the agent re-connects to Docker. Ahh, and the agent can parse your logs adds GeoIP info and gets performance metrics and Docker events. What has it common with Logspout?

Working with the Docker API, means it detects new containers and ships all logs (if no filter is set ...)
Support log routing by Docker Labels or ENV var from containers to full text indices https://github.com/sematext/sematext-agent-docker#log-routing

We don't take bitcoins, but offer a free plan.

schickling commented 8 years ago

Any news on this @tmornini @progrium? :+1:

tmornini commented 8 years ago

@schickling Yes, but not good news unfortunately.

We decided to use the syslog log driver to send the logs to the host where we configured rsyslog to send them through to Loggly.

We just deployed it today, will report back on how it works.

We use ECS, and our host rsyslog config file is at:

https://gist.github.com/tmornini/aa25e93556622b982d473899e5305e33

tmornini commented 8 years ago

We've been running 3 days now without issue, the longest continuous and unbroken log stream since we started using Loggly via logspout.

We have the best filterability we've ever had, as well.

syslog metadata	our metadata	example
TAG	product	subledger
TAG	environment	ci, production
APP-NAME	task_name-task_version	proxy-67
MSGID	container_name	proxy
HOSTNAME	container_id	4c01db0b339c
PROCID	pid	2657

rsyslog config file gist: https://gist.github.com/tmornini/aa25e93556622b982d473899e5305e33

josh-padnick commented 8 years ago

@tmornini Is there a reason you didn't just configure each local syslog to send directly to Loggly vs sending to a central rsyslog host?

I'm asking because I'm curious about various setup options here.

tmornini commented 8 years ago

@josh-padnick I couldn't figure out how to map the metadata I wanted. If I could have, I'd have preferred it that way.

It is nice to have the host logs up in Loggly too, of course, so if we're going to configure the host rsyslog, we might as well use it. :-)

I've updated the gist to include everything required to make this work on ECS.

It's grotesquely inefficient to run al those regexs per log line, and the regex themselves are ripe for tuning, but we're not able to visually detect the additional CPU required, so perhaps it's just not that big a deal. :-)

josh-padnick commented 8 years ago

@tmornini Thanks for clarifying! I thought you were saying you send all logs to a separate server running rsyslog, but now I see you're just going from syslog to rsyslog on the same host. What's nice is that, according to my understanding, (r)syslog will store logs on disk and have built-in retry for sending logs to a remote destination so it sounds like things are pretty fault-tolerant.

I've also had success running a fluentd daemon on the host that also saves logs to disk and will retry if it can't reach its destination. Probably the one benefit of fluentd over the rsyslog setup is that it's pretty easy to swap out different log destinations (e.g. Loggly vs. Elasticsearch), but to get the metadata we wanted, a custom version of the fluentd docker plugin was needed.

michaelshobbs commented 8 years ago

@progrium @mattaitchison @andyshinn where did this end up?

andyshinn commented 8 years ago

After reading this thread, it appears there may be multiple issues happening here. But the original issue would appear to be caused by the CNAME or A record changing (the IP is in AWS and would likely be a front end server or load balancer changing IPs).

Some debugging that would help confirm this would be to resolve logs-01.loggly.com when the error occurs to confirm that the IP from error and DNS record are indeed different.

The fix would need to be a retry mechanism that attempts to resolve the name again and reconnect (maybe a max retries + interval between retries option).

vinuguna commented 7 years ago

Status update on this issue? In our case, we use logspout to ship logs over TCP to the ELK stack. Certain container logs fail to make it to elasticsearch. When a TCP pipe of a container in logspout dies, it never recovers.

michaelshobbs commented 7 years ago

I'll work on getting #253 tested and merged

michaelshobbs commented 7 years ago

Potential fix + tests were merged in f18e503780b8f3abb3771ffaa97e5d2971199f9b

gliderlabs / logspout

Broken Pipe when Sending TCP Data to Loggly #107

logspout v3 by gliderlabs

adapters: tcp syslog raw tls udp

options : persist:/mnt/routes

jobs : http[logs,routes]:80 pump

routes :

ADAPTER ADDRESS CONTAINERS SOURCES OPTIONS

syslog+tcp logs-01.loggly.com:514 map[]