Closed brianwallace closed 7 years ago
This is definitely not a loggly-specific problem. It seems the new adapter code causes this problem both for TCP and UDP (surprisingly). I am writing to a local syslog endpoint: with v2
or latest
(i.e. release branch) I can write UDP, stop the endpoint, restart it, and logspout carries on writing. With master
, using either TCP or UDP, if I restart the endpoint the connection is dropped and not recreated.
@rlister, thanks for the feedback. I have updated to logspout v2
and will give this a shot. I should be able to confirm in a couple days whether or not this solves my issue (it sounds like it will!).
Any updates on this. I am getting this error too with (raw+tcp)
logspout v3-master by gliderlabs
raw: write tcp
FYI, I was not able to go back to a previous release v2
or latest
(i.e. release branch) because it does not support the new endpoint format (to the best of my knowledge):
for example:
syslog+tcp://logs-01.loggly.com:514
Any updates on this?
would love to see this fixed as well. Currently logspout is not trying to reconnect after a broken pipe err, which effectively makes it very unreliable
Is there any workaround for this problem? Once the pipe breaks, I have to manually log in to my instance(s) and restart the logspout container. Not really an acceptable solution...
We're also experiencing this. Do the adapters themselves need to be rewritten or is this a logspout core problem?
:+1: Have seen this as well...
+1
:+1:
Just saw this again. Not having continuous logs sucks. :-(
:+1: from me, experiencing this in production, and our only workaround is a cronjob that restarts the logspout container every hour. :cry:
I just want to say that after investing a significant amount of time in setting up logspout, we opted to use fluentd with a customized version of the fluent-plugin-docker-format. This automatically picks up all new Docker containers, is fault-tolerant (if fluentd fails, it just picks up reading off of the docker log files where it picked off), performant, and as customizable as the fluentd plugin ecosystem.
While I like some of the core ideas behind logspout, I believe better solutions are now available.
Just saw this again.
Is this project being maintained?
Sometimes! I would love help. Can somebody confirm #150 fixes this?
See also: http://progrium.com/blog/2015/12/04/leadership-guilt-and-pull-requests/
I followed the approach from #150 to create a patched version of the logspout-logstash module for which I was seeing the same issue. It fixed the issue in that instance at least.
One more thumbs up?
@progrium Wonderful to hear. I'd love to verify #150 but don't have the Go chops (yet!) to handle that.
Suggestion / alternative to #150: instead of logging broken pipe
and continuing execution, just exit (with whatever exit code makes sense). Docker and Kubernetes can restart the container automatically, and will keep a count of restarts.
@meonkeys I like the cleanliness of your method, but it concerns me that logs may be lost during the switchover.
Yes, normally I'm all about fail fast and letting higher up supervisors restart, but that would mean potential lost logs. If it couldn't reconnect after a few tries, I'm happy to have it fail out.
On Fri, Jan 15, 2016 at 7:21 PM, Tom Mornini notifications@github.com wrote:
@meonkeys https://github.com/meonkeys I like the cleanliness of this, but it concerns me that logs may be lost during the switchover.
— Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-172142561 .
Jeff Lindsay http://progrium.com
Ah, got it, that makes sense. I assumed that k8s (or something else) was keeping logs safe/durable.
@meonkeys k8s?
Sorry, k8s is just shorthand for Kubernetes.
@progrium Any chance of cutting a release so I can give you some feedback?
I'd love to see this supported for raw TCP connections as well, instead of just syslog.
Hi currently tcp support for syslog servers is not working or is it work in progress.
I am trying to send container logs to syslog server through following url
syslog+tcp://
:+1:
@progrium Here's a report after 17 hours on v3
We deployed to 4 servers 17 hours ago ith syslog+tcp://logs-01.loggly.com:514
3 of the servers are still logging, 1 stopped logging entirely about 8.5 hours ago.
We ran docker logs
on the Loggly container that has stopped logging and checked it's log:
# logspout v3 by gliderlabs
# adapters: tcp syslog raw tls udp
# options : persist:/mnt/routes
# jobs : http[logs,routes]:80 pump
# routes :
# ADAPTER ADDRESS CONTAINERS SOURCES OPTIONS
# syslog+tcp logs-01.loggly.com:514 map[]
We checked logs on the rest of the containers and they've been logging the entire time.
So, it's a total mystery as to why the logging has stopped. Based upon previous behavior, we expect the rest of the servers will follow this one to the same condition eventually.
Okay. I have some ideas and at the very least know some simple additions for better debugging. Anybody willing to sponsor this work?
On Thu, Mar 17, 2016 at 11:44 AM, Tom Mornini notifications@github.com wrote:
@progrium https://github.com/progrium Here's a report after 17 hours on v3
We deployed to 4 servers 17 hours ago ith syslog+tcp:// logs-01.loggly.com:514
3 of the servers are still logging, 1 stopped logging entirely about 8.5 hours ago.
We ran docker logs on the Loggly container that has stopped logging and checked it's log:
logspout v3 by gliderlabs
adapters: tcp syslog raw tls udp
options : persist:/mnt/routes
jobs : http[logs,routes]:80 pump
routes :
ADAPTER ADDRESS CONTAINERS SOURCES OPTIONS
syslog+tcp logs-01.loggly.com:514 map[]
We checked logs on the rest of the containers and they've been logging the entire time.
So, it's a total mystery as to why the logging has stopped. Based upon previous behavior, we expect the rest of the servers will follow this one to the same condition eventually.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-197968968
Jeff Lindsay http://progrium.com
@progrium Yes. I assume you accept Bitcoin?
I ... well I haven't, but I guess I could?
On Fri, Mar 18, 2016 at 1:51 AM, Tom Mornini notifications@github.com wrote:
@progrium https://github.com/progrium Yes. I assume you accept Bitcoin?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-198231448
Jeff Lindsay http://progrium.com
@progrium Ok, let's negotiate privately and get this underway!
my username at gmail if you lost my email. :)
On Mon, Mar 21, 2016 at 8:00 PM, Tom Mornini notifications@github.com wrote:
@progrium https://github.com/progrium Ok, let's negotiate privately and get this underway!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/gliderlabs/logspout/issues/107#issuecomment-199567239
Jeff Lindsay http://progrium.com
Please have a look to https://github.com/sematext/sematext-agent-docker it re-connects when the Logsene backend is not reachable, it buffers and re-transmit logs, which could not be shipped. If the connection to Docker Socket is lost, the agent re-connects to Docker. Ahh, and the agent can parse your logs adds GeoIP info and gets performance metrics and Docker events. What has it common with Logspout?
We don't take bitcoins, but offer a free plan.
Any news on this @tmornini @progrium? :+1:
@schickling Yes, but not good news unfortunately.
We decided to use the syslog log driver to send the logs to the host where we configured rsyslog to send them through to Loggly.
We just deployed it today, will report back on how it works.
We use ECS, and our host rsyslog config file is at:
https://gist.github.com/tmornini/aa25e93556622b982d473899e5305e33
We've been running 3 days now without issue, the longest continuous and unbroken log stream since we started using Loggly via logspout.
We have the best filterability we've ever had, as well.
syslog metadata | our metadata | example |
---|---|---|
TAG | product | subledger |
TAG | environment | ci, production |
APP-NAME | task_name-task_version | proxy-67 |
MSGID | container_name | proxy |
HOSTNAME | container_id | 4c01db0b339c |
PROCID | pid | 2657 |
rsyslog config file gist: https://gist.github.com/tmornini/aa25e93556622b982d473899e5305e33
@tmornini Is there a reason you didn't just configure each local syslog to send directly to Loggly vs sending to a central rsyslog host?
I'm asking because I'm curious about various setup options here.
@josh-padnick I couldn't figure out how to map the metadata I wanted. If I could have, I'd have preferred it that way.
It is nice to have the host logs up in Loggly too, of course, so if we're going to configure the host rsyslog, we might as well use it. :-)
I've updated the gist to include everything required to make this work on ECS.
It's grotesquely inefficient to run al those regexs per log line, and the regex themselves are ripe for tuning, but we're not able to visually detect the additional CPU required, so perhaps it's just not that big a deal. :-)
@tmornini Thanks for clarifying! I thought you were saying you send all logs to a separate server running rsyslog, but now I see you're just going from syslog to rsyslog on the same host. What's nice is that, according to my understanding, (r)syslog will store logs on disk and have built-in retry for sending logs to a remote destination so it sounds like things are pretty fault-tolerant.
I've also had success running a fluentd daemon on the host that also saves logs to disk and will retry if it can't reach its destination. Probably the one benefit of fluentd over the rsyslog setup is that it's pretty easy to swap out different log destinations (e.g. Loggly vs. Elasticsearch), but to get the metadata we wanted, a custom version of the fluentd docker plugin was needed.
@progrium @mattaitchison @andyshinn where did this end up?
After reading this thread, it appears there may be multiple issues happening here. But the original issue would appear to be caused by the CNAME or A record changing (the IP is in AWS and would likely be a front end server or load balancer changing IPs).
Some debugging that would help confirm this would be to resolve logs-01.loggly.com when the error occurs to confirm that the IP from error and DNS record are indeed different.
The fix would need to be a retry mechanism that attempts to resolve the name again and reconnect (maybe a max retries + interval between retries option).
Status update on this issue? In our case, we use logspout to ship logs over TCP to the ELK stack. Certain container logs fail to make it to elasticsearch. When a TCP pipe of a container in logspout dies, it never recovers.
I'll work on getting #253 tested and merged
Potential fix + tests were merged in f18e503780b8f3abb3771ffaa97e5d2971199f9b
I am using logspout to stream docker logs into Loggly over TCP. This is done using the
syslog+tcp://logs-01.loggly.com
endpoint:Way too frequently, however, I am seeing a broken pipe message and data streaming from logspout stops. From the logspout logs:
2015/06/02 01:04:07 syslog: write tcp 54.236.68.122:514: broken pipe
I am assuming this occurs when the Loggly Route 53 ingestion endpoint changes IP and the TCP connection is lost.
Logspout should be updated to attempt to reestablish this connection if it is lost.